Batch Contextual Thompson Sampling Policy
Source:R/armed_bandit_helpers.R
BatchContextualLinTSPolicy.Rd
Batch Contextual Thompson Sampling Policy
Batch Contextual Thompson Sampling Policy
Methods
- `initialize(v = 0.2, batch_size = 1)`: Constructor, sets variance and batch size. - `set_parameters(context_params)`: Initializes arm-level matrices. - `get_action(t, context)`: Samples from the posterior and selects action. - `set_reward(t, context, action, reward)`: Updates posterior statistics using observed feedback.
Public fields
sigma
Numeric, posterior variance scale parameter.
batch_size
Integer, size of mini-batches before parameter updates.
A_cc
List of accumulated Gram matrices per arm.
b_cc
List of reward-weighted context sums per arm.
class_name
Internal name of the class.
Methods
Inherited methods
Method new()
Constructor for the batch-based Thompson Sampling policy.
Usage
BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)
Method get_action()
Samples from the posterior distribution of expected rewards and selects an action.
Method set_reward()
Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every `batch_size` rounds.