bandit.Bandit
bandit.Bandit()
An abstract class for Bandit algorithms used in the MAD algorithm.
Each bandit algorithm that inherits from this class must implement all the abstract methods defined in this class.
Notes
See the detailed method documentation for in-depth explanations.
Methods
Name | Description |
---|---|
control | Get the index of the bandit control arm. |
k | Get the number of bandit arms. |
probabilities | Calculate bandit arm assignment probabilities. |
reward | Calculate the reward for a selected bandit arm. |
t | Get the current time step of the bandit. |
control
bandit.Bandit.control()
Get the index of the bandit control arm.
Returns
Name | Type | Description |
---|---|---|
int | The index of the arm that is the control arm. E.g. if the bandit is a 3-arm bandit with the first arm being the control arm, this should return the value 0. |
k
bandit.Bandit.k()
Get the number of bandit arms.
int The number of arms in the bandit.
probabilities
bandit.Bandit.probabilities()
Calculate bandit arm assignment probabilities.
Returns
Name | Type | Description |
---|---|---|
Dict[int, float] | A dictionary where keys are arm indices and values are the corresponding probabilities. For example, if the bandit algorithm is UCB with three arms, and the third arm has the maximum confidence bound, then this should return the following dictionary: {0: 0., 1: 0., 2: 1.} , since UCB is deterministic. |
reward
bandit.Bandit.reward(arm)
Calculate the reward for a selected bandit arm.
Returns the reward for a selected arm.
Parameters
Name | Type | Description | Default |
---|---|---|---|
arm | int | The index of the selected bandit arm. | required |
Returns
Name | Type | Description |
---|---|---|
Reward | The resulting Reward containing any individual-level covariates and the observed reward. |
t
bandit.Bandit.t()
Get the current time step of the bandit.
This method returns the current time step of the bandit, and then increments the time step by 1. E.g. if the bandit has completed 9 iterations, this should return the value 10. Time steps start at 1, not 0.
Returns
Name | Type | Description |
---|---|---|
int | The current time step. |