Skip to contents

Contextual Linear Bandit Environment

Contextual Linear Bandit Environment

Details

An R6 class for simulating a contextual linear bandit environment with normally distributed rewards.

Methods

- `initialize(k, d, list_betas, sigma = 0.1, binary_rewards = FALSE)`: Constructor. - `post_initialization()`: Loads correct coefficients based on `sim_id`. - `get_context(t)`: Returns context and sets internal reward vector. - `get_reward(t, context_common, action)`: Returns observed reward for an action.

Super class

cramR::NA -> ContextualLinearBandit

Public fields

rewards

A vector of rewards for each arm in the current round.

betas

Coefficient matrix of the linear reward model (one column per arm).

sigma

Standard deviation of the Gaussian noise added to rewards.

binary

Logical, indicating whether to convert rewards into binary outcomes.

weights

The latent reward scores before noise and/or binarization.

list_betas

A list of coefficient matrices, one per simulation.

sim_id

Index for selecting which simulation's coefficients to use.

class_name

Name of the class for internal tracking.

Methods

Inherited methods


Method new()

Usage

ContextualLinearBandit$new(
  k,
  d,
  list_betas,
  sigma = 0.1,
  binary_rewards = FALSE
)

Arguments

k

Number of arms

d

Number of features

list_betas

A list of true beta matrices for each simulation

sigma

Standard deviation of Gaussian noise

binary_rewards

Logical, use binary rewards or not


Method post_initialization()

Set the simulation-specific coefficients for the current simulation.

Usage

ContextualLinearBandit$post_initialization()

Returns

No return value; modifies the internal state of the object.


Method get_context()

Usage

ContextualLinearBandit$get_context(t)

Arguments

t

Current time step

Returns

A list containing context vector `X` and arm count `k`


Method get_reward()

Usage

ContextualLinearBandit$get_reward(t, context_common, action)

Arguments

t

Current time step

context_common

Context shared across arms

action

Action taken by the policy

Returns

A list with reward and optimal arm/reward info


Method clone()

The objects of this class are cloneable with this method.

Usage

ContextualLinearBandit$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.