Contextual Linear Bandit Environment

Details

An R6 class for simulating a contextual linear bandit environment with normally distributed rewards.

Methods

- `initialize(k, d, list_betas, sigma = 0.1, binary_rewards = FALSE)`: Constructor. - `post_initialization()`: Loads correct coefficients based on `sim_id`. - `get_context(t)`: Returns context and sets internal reward vector. - `get_reward(t, context_common, action)`: Returns observed reward for an action.

Super class

cramR::NA -> ContextualLinearBandit

Public fields

rewards: A vector of rewards for each arm in the current round.
betas: Coefficient matrix of the linear reward model (one column per arm).
sigma: Standard deviation of the Gaussian noise added to rewards.
binary: Logical, indicating whether to convert rewards into binary outcomes.
weights: The latent reward scores before noise and/or binarization.
list_betas: A list of coefficient matrices, one per simulation.
sim_id: Index for selecting which simulation's coefficients to use.
class_name: Name of the class for internal tracking.

Methods

Public methods

ContextualLinearBandit$new()
ContextualLinearBandit$post_initialization()
ContextualLinearBandit$get_context()
ContextualLinearBandit$get_reward()
ContextualLinearBandit$clone()

Inherited methods

Method `new()`

Usage

ContextualLinearBandit$new(
  k,
  d,
  list_betas,
  sigma = 0.1,
  binary_rewards = FALSE
)

Arguments

k: Number of arms
d: Number of features
list_betas: A list of true beta matrices for each simulation
sigma: Standard deviation of Gaussian noise
binary_rewards: Logical, use binary rewards or not

Method `post_initialization()`

Set the simulation-specific coefficients for the current simulation.

Usage

ContextualLinearBandit$post_initialization()

Returns

No return value; modifies the internal state of the object.

Method `get_context()`

Usage

ContextualLinearBandit$get_context(t)

Arguments

t: Current time step

Returns

A list containing context vector `X` and arm count `k`

Method `get_reward()`

Usage

ContextualLinearBandit$get_reward(t, context_common, action)

Arguments

t: Current time step
context_common: Context shared across arms
action: Action taken by the policy

Returns

A list with reward and optimal arm/reward info

Method `clone()`

The objects of this class are cloneable with this method.

Usage

ContextualLinearBandit$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Contextual Linear Bandit Environment

Details

Methods

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method post_initialization()

Usage

Returns

Method get_context()

Usage

Arguments

Returns

Method get_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Method `new()`

Method `post_initialization()`

Method `get_context()`

Method `get_reward()`

Method `clone()`