Batch Contextual Thompson Sampling Policy

Details

Implements Thompson Sampling for linear contextual bandits with batch updates.

Methods

- `initialize(v = 0.2, batch_size = 1)`: Constructor, sets variance and batch size. - `set_parameters(context_params)`: Initializes arm-level matrices. - `get_action(t, context)`: Samples from the posterior and selects action. - `set_reward(t, context, action, reward)`: Updates posterior statistics using observed feedback.

Super class

cramR::NA

Public fields

sigma: Numeric, posterior variance scale parameter.
batch_size: Integer, size of mini-batches before parameter updates.
A_cc: List of accumulated Gram matrices per arm.
b_cc: List of reward-weighted context sums per arm.
class_name: Internal name of the class.

Methods

Public methods

BatchContextualLinTSPolicy$new()
BatchContextualLinTSPolicy$set_parameters()
BatchContextualLinTSPolicy$get_action()
BatchContextualLinTSPolicy$set_reward()
BatchContextualLinTSPolicy$clone()

Inherited methods

Method `new()`

Constructor for the batch-based Thompson Sampling policy.

Usage

BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)

Arguments

v: Numeric. Standard deviation scaling parameter for posterior sampling.
batch_size: Integer. Number of rounds before parameters are updated.

Method `set_parameters()`

Initializes per-arm sufficient statistics.

Usage

BatchContextualLinTSPolicy$set_parameters(context_params)

Arguments

context_params: List with entries: `unique` (feature vector), `k` (number of arms).

Method `get_action()`

Samples from the posterior distribution of expected rewards and selects an action.

Usage

BatchContextualLinTSPolicy$get_action(t, context)

Arguments

t: Integer. Time step.
context: List containing the current context and arm information.

Returns

A list with the chosen arm (`choice`).

Method `set_reward()`

Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every `batch_size` rounds.

Usage

BatchContextualLinTSPolicy$set_reward(t, context, action, reward)

Arguments

t: Integer. Time step.
context: Context object containing feature info.
action: Chosen action (arm index).
reward: Observed reward for the action.

Returns

Updated internal parameters.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BatchContextualLinTSPolicy$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Batch Contextual Thompson Sampling Policy

Details

Methods

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method set_parameters()

Usage

Arguments

Method get_action()

Usage

Arguments

Returns

Method set_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`