Skip to contents

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Details

Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.

Methods

- `initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)`: Constructor. - `set_parameters(context_params)`: Initializes sufficient statistics for each arm. - `get_action(t, context)`: Selects an arm using UCB scores and epsilon-greedy rule. - `set_reward(t, context, action, reward)`: Updates statistics and refreshes model at batch intervals.

Super class

cramR::NA

Public fields

alpha

Numeric, UCB exploration strength parameter.

epsilon

Numeric, probability of taking a random exploratory action.

batch_size

Integer, number of rounds per batch update.

A_cc

List of Gram matrices per arm, accumulated across batch.

b_cc

List of reward-weighted context vectors per arm.

class_name

Internal class name identifier.

Methods

Inherited methods


Method new()

Constructor for batched LinUCB with epsilon-greedy exploration.

Usage

BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)

Arguments

alpha

Numeric. UCB width parameter (exploration strength).

epsilon

Numeric. Probability of selecting a random arm.

batch_size

Integer. Number of rounds before updating parameters.


Method set_parameters()

Initialize arm-specific parameter containers.

Usage

BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)

Arguments

context_params

List containing at least `unique` (feature size) and `k` (number of arms).


Method get_action()

Chooses an arm based on UCB and epsilon-greedy sampling.

Usage

BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)

Arguments

t

Integer timestep.

context

List containing the context for the decision.

Returns

A list with the selected action.


Method set_reward()

Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.

Usage

BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)

Arguments

t

Integer timestep.

context

Context object used for decision-making.

action

List containing the chosen action.

reward

List containing the observed reward.

Returns

Updated internal model parameters.


Method clone()

The objects of this class are cloneable with this method.

Usage

BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.