Skip to contents

Batch Contextual Thompson Sampling Policy

Batch Contextual Thompson Sampling Policy

Details

Implements Thompson Sampling for linear contextual bandits with batch updates.

Methods

- `initialize(v = 0.2, batch_size = 1)`: Constructor, sets variance and batch size. - `set_parameters(context_params)`: Initializes arm-level matrices. - `get_action(t, context)`: Samples from the posterior and selects action. - `set_reward(t, context, action, reward)`: Updates posterior statistics using observed feedback.

Super class

cramR::NA

Public fields

sigma

Numeric, posterior variance scale parameter.

batch_size

Integer, size of mini-batches before parameter updates.

A_cc

List of accumulated Gram matrices per arm.

b_cc

List of reward-weighted context sums per arm.

class_name

Internal name of the class.

Methods

Inherited methods


Method new()

Constructor for the batch-based Thompson Sampling policy.

Usage

BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)

Arguments

v

Numeric. Standard deviation scaling parameter for posterior sampling.

batch_size

Integer. Number of rounds before parameters are updated.


Method set_parameters()

Initializes per-arm sufficient statistics.

Usage

BatchContextualLinTSPolicy$set_parameters(context_params)

Arguments

context_params

List with entries: `unique` (feature vector), `k` (number of arms).


Method get_action()

Samples from the posterior distribution of expected rewards and selects an action.

Usage

BatchContextualLinTSPolicy$get_action(t, context)

Arguments

t

Integer. Time step.

context

List containing the current context and arm information.

Returns

A list with the chosen arm (`choice`).


Method set_reward()

Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every `batch_size` rounds.

Usage

BatchContextualLinTSPolicy$set_reward(t, context, action, reward)

Arguments

t

Integer. Time step.

context

Context object containing feature info.

action

Chosen action (arm index).

reward

Observed reward for the action.

Returns

Updated internal parameters.


Method clone()

The objects of this class are cloneable with this method.

Usage

BatchContextualLinTSPolicy$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.