Creates a linear Thompson Sampling model for multi-armed bandit problems.
Arguments
- K
Integer. Number of arms. Must be a positive integer.
- p
Integer. Dimension of the contextual vector, if
is_contextual
is set toTRUE
. Otherwise,p
is ignored. Must be a positive integer.- floor_start
Numeric. Specifies the initial value for the assignment probability floor. It ensures that at the start of the process, no assignment probability falls below this threshold. Must be a positive number.
- floor_decay
Numeric. Decay rate of the floor. The floor decays with the number of observations in the experiment such that at each point in time, the applied floor is:
floor_start/(s^{floor_decay})
, wheres
is the starting index for a batched experiment, or the observation index for an online experiment. Must be a number between 0 and 1 (inclusive).- num_mc
Integer. Number of Monte Carlo simulations used to approximate the expected reward. Must be a positive integer. Default is 100.
- is_contextual
Logical. Indicates whether the problem is contextual or not. Default is
TRUE
.