Batch Disjoint LinUCB Policy with Epsilon-Greedy
Source:R/armed_bandit_helpers.R
BatchLinUCBDisjointPolicyEpsilon.Rd
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Details
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.
Methods
- `initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)`: Constructor. - `set_parameters(context_params)`: Initializes sufficient statistics for each arm. - `get_action(t, context)`: Selects an arm using UCB scores and epsilon-greedy rule. - `set_reward(t, context, action, reward)`: Updates statistics and refreshes model at batch intervals.
Public fields
alpha
Numeric, UCB exploration strength parameter.
epsilon
Numeric, probability of taking a random exploratory action.
batch_size
Integer, number of rounds per batch update.
A_cc
List of Gram matrices per arm, accumulated across batch.
b_cc
List of reward-weighted context vectors per arm.
class_name
Internal class name identifier.
Methods
Inherited methods
Method new()
Constructor for batched LinUCB with epsilon-greedy exploration.
Usage
BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)
Method set_reward()
Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.