bandit
Provides Bandit
class for bandit simulations.
Bandit(arms, rng=None, seed=None)
Multi-armed bandit with one or more arms.
This class wraps around a list of arms, each of which has a reward distribution. It provides an interface for interacting with the arms, such as playing a specific arm, querying for the optimal arm, and computing regret from a given choice.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms |
list[Arm]
|
A list of arms for the bandit. |
required |
rng |
Generator | None
|
A random number generator. |
None
|
seed |
int | None
|
A seed for random number generation if |
None
|
Source code in mabby/bandit.py
24 25 26 27 28 29 30 31 32 33 34 35 |
|
means: list[float]
property
__getitem__(i)
Returns an arm by index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
i |
int
|
The index of the arm to get. |
required |
Returns:
Type | Description |
---|---|
Arm
|
The arm at the given index. |
Source code in mabby/bandit.py
45 46 47 48 49 50 51 52 53 54 |
|
__iter__()
Returns an iterator over the bandit's arms.
Source code in mabby/bandit.py
56 57 58 |
|
__len__()
Returns the number of arms.
Source code in mabby/bandit.py
37 38 39 |
|
__repr__()
Returns a string representation of the bandit.
Source code in mabby/bandit.py
41 42 43 |
|
best_arm()
Returns the index of the optimal arm.
The optimal arm is the arm with the greatest expected reward. If there are multiple arms with equal expected rewards, a random one is chosen.
Returns:
Type | Description |
---|---|
int
|
The index of the optimal arm. |
Source code in mabby/bandit.py
80 81 82 83 84 85 86 87 88 89 |
|
is_opt(choice)
Returns the optimality of a given choice.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
choice |
int
|
The index of the chosen arm. |
required |
Returns:
Type | Description |
---|---|
bool
|
|
Source code in mabby/bandit.py
91 92 93 94 95 96 97 98 99 100 |
|
play(i)
Plays an arm by index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
i |
int
|
The index of the arm to play. |
required |
Returns:
Type | Description |
---|---|
float
|
The reward from playing the arm. |
Source code in mabby/bandit.py
60 61 62 63 64 65 66 67 68 69 |
|
regret(choice)
Returns the regret from a given choice.
The regret is computed as the difference between the expected reward from the optimal arm and the expected reward from the chosen arm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
choice |
int
|
The index of the chosen arm. |
required |
Returns:
Type | Description |
---|---|
float
|
The computed regret value. |
Source code in mabby/bandit.py
102 103 104 105 106 107 108 109 110 111 112 113 114 |
|