Batch Contextual Epsilon-Greedy Policy
Source:R/armed_bandit_helpers.R
BatchContextualEpsilonGreedyPolicy.Rd
Batch Contextual Epsilon-Greedy Policy
Batch Contextual Epsilon-Greedy Policy
Details
Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.
Public fields
epsilon
Probability of selecting a random arm (exploration rate).
batch_size
Number of rounds per batch before updating model parameters.
A_cc
List of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_cc
List of reward-weighted context sums (one per arm), updated batch-wise.
class_name
Internal class name identifier.
Methods
Inherited methods
Method new()
Constructor for the Batch Epsilon-Greedy policy.
Usage
BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)
Method set_reward()
Updates model statistics based on observed reward. Updates occur once per batch.