Updates the parameters of a linear Thompson Sampling model for multi-armed bandit problems based on new observations.
Arguments
- ws
Integer vector. Indicates which arm was chosen for observations at each time
t
. LengthA
, whereA
is the number of observations. Must not contain NA values.- yobs
Numeric vector. Observed outcomes, length
A
. Must not contain NA values.- model
List. Contains the parameters of the LinTSModel.
- xs
Optional matrix. Covariates of shape
[A, p]
, wherep
is the number of features, if the LinTSModel is contextual. Default isNULL
. Must not contain NA values.- ps
Optional matrix. Probabilities of selecting each arm for each observation, if the LinTSModel is balanced. Default is
NULL
.- balanced
Logical. Indicates whether to use balanced Thompson Sampling. Default is
NULL
.
Examples
set.seed(123)
model <- LinTSModel(K = 5, p = 3, floor_start = 1, floor_decay = 0.9, num_mc = 100,
is_contextual = TRUE)
A <- 1000
ws <- numeric(A)
yobs <- numeric(A)
model <- update_thompson(ws = ws, yobs = yobs, model = model)