LinUCB Disjoint Policy with Epsilon-Greedy Exploration

Details

Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration.

Methods

- `initialize(alpha = 1.0, epsilon = 0.1)`: Create a new LinUCBDisjointPolicyEpsilon object. - `set_parameters(context_params)`: Initialize arm-level parameters. - `get_action(t, context)`: Selects an arm using epsilon-greedy UCB. - `set_reward(t, context, action, reward)`: Updates internal statistics based on observed reward.

Super class

cramR::NA

Public fields

alpha: Numeric, exploration parameter controlling the width of the confidence bound.
epsilon: Numeric, probability of selecting a random action (exploration).
class_name: Internal class name.

Methods

Public methods

LinUCBDisjointPolicyEpsilon$new()
LinUCBDisjointPolicyEpsilon$set_parameters()
LinUCBDisjointPolicyEpsilon$get_action()
LinUCBDisjointPolicyEpsilon$set_reward()
LinUCBDisjointPolicyEpsilon$clone()

Inherited methods

Method `new()`

Initializes the policy with UCB parameter alpha and exploration rate epsilon.

Usage

LinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1)

Arguments

alpha: Numeric. Controls width of the UCB bonus.
epsilon: Numeric between 0 and 1. Probability of random action selection.

Method `set_parameters()`

Set arm-specific parameter structures.

Usage

LinUCBDisjointPolicyEpsilon$set_parameters(context_params)

Arguments

context_params: A list with context information, typically including the number of unique features.

Method `get_action()`

Selects an arm using epsilon-greedy Upper Confidence Bound (UCB).

Usage

LinUCBDisjointPolicyEpsilon$get_action(t, context)

Arguments

t: Integer time step.
context: A list with contextual features and number of arms.

Returns

A list containing the selected action.

Method `set_reward()`

Updates internal statistics using the observed reward for the selected arm.

Usage

LinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)

Arguments

t: Integer time step.
context: Contextual features for all arms at time t.
action: A list containing the chosen arm.
reward: A list containing the observed reward for the selected arm.

Returns

Updated internal parameters.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LinUCBDisjointPolicyEpsilon$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

LinUCB Disjoint Policy with Epsilon-Greedy Exploration

Details

Methods

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method set_parameters()

Usage

Arguments

Method get_action()

Usage

Arguments

Returns

Method set_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`