Batch Contextual Epsilon-Greedy Policy
Source:R/armed_bandit_helpers.R
BatchContextualEpsilonGreedyPolicy.RdBatch Contextual Epsilon-Greedy Policy
Batch Contextual Epsilon-Greedy Policy
Details
Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.
Public fields
epsilonProbability of selecting a random arm (exploration rate).
batch_sizeNumber of rounds per batch before updating model parameters.
A_ccList of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_ccList of reward-weighted context sums (one per arm), updated batch-wise.
class_nameInternal class name identifier.
Methods
Inherited methods
Method new()
Constructor for the Batch Epsilon-Greedy policy.
Usage
BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)Method set_reward()
Updates model statistics based on observed reward. Updates occur once per batch.