Skip to contents

Updates the parameters of a linear Thompson Sampling model for multi-armed bandit problems based on new observations.

Usage

update_thompson(ws, yobs, model, xs = NULL, ps = NULL, balanced = NULL)

Arguments

ws

Integer vector. Indicates which arm was chosen for observations at each time t. Length A, where A is the number of observations. Must not contain NA values.

yobs

Numeric vector. Observed outcomes, length A. Must not contain NA values.

model

List. Contains the parameters of the LinTSModel.

xs

Optional matrix. Covariates of shape [A, p], where p is the number of features, if the LinTSModel is contextual. Default is NULL. Must not contain NA values.

ps

Optional matrix. Probabilities of selecting each arm for each observation, if the LinTSModel is balanced. Default is NULL.

balanced

Logical. Indicates whether to use balanced Thompson Sampling. Default is NULL.

Value

A list containing the updated parameters of the LinTSModel.

Examples

set.seed(123)
model <- LinTSModel(K = 5, p = 3, floor_start = 1, floor_decay = 0.9, num_mc = 100,
                    is_contextual = TRUE)
A <- 1000
ws <- numeric(A)
yobs <- numeric(A)
model <- update_thompson(ws = ws, yobs = yobs, model = model)