strategies
Multi-armed bandit strategies.
mabby provides a collection of preset bandit strategies that can be plugged into
simulations. The Strategy
abstract base class
can also be sub-classed to implement custom bandit strategies.
BetaTSStrategy(general=False)
Bases: Strategy
Thompson sampling strategy with Beta priors.
If general
is False
, rewards used for updates must be either 0 or 1.
Otherwise, rewards must be with support [0, 1].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
general |
bool
|
Whether to use a generalized version of the strategy. |
False
|
Source code in mabby/strategies/thompson.py
21 22 23 24 25 26 27 28 29 30 |
|
EpsilonFirstStrategy(eps)
Bases: SemiUniformStrategy
Epsilon-first bandit strategy.
The epsilon-first strategy has a pure exploration phase followed by a pure exploitation phase.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
eps |
float
|
The ratio of exploration steps (must be between 0 and 1). |
required |
Source code in mabby/strategies/semi_uniform.py
132 133 134 135 136 137 138 139 140 141 |
|
EpsilonGreedyStrategy(eps)
Bases: SemiUniformStrategy
Epsilon-greedy bandit strategy.
The epsilon-greedy strategy has a fixed chance of exploration every time step.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
eps |
float
|
The chance of exploration (must be between 0 and 1). |
required |
Source code in mabby/strategies/semi_uniform.py
103 104 105 106 107 108 109 110 111 112 |
|
RandomStrategy()
Bases: SemiUniformStrategy
Random bandit strategy.
The random strategy chooses arms at random, i.e., it explores with 100% chance.
Source code in mabby/strategies/semi_uniform.py
84 85 86 |
|
SemiUniformStrategy()
Bases: Strategy
, ABC
, EnforceOverrides
Base class for semi-uniform bandit strategies.
Every semi-uniform strategy must implement
effective_eps
to compute the chance of exploration at each time step.
Source code in mabby/strategies/semi_uniform.py
33 34 |
|
effective_eps()
abstractmethod
Returns the effective epsilon value.
The effective epsilon value is the probability at the current time step that the bandit will explore rather than exploit. Depending on the strategy, the effective epsilon value may be different from the nominal epsilon value set.
Source code in mabby/strategies/semi_uniform.py
68 69 70 71 72 73 74 75 |
|
Strategy()
Bases: ABC
, EnforceOverrides
Base class for a bandit strategy.
A strategy provides the computational logic for choosing which bandit arms to play and updating parameter estimates.
Source code in mabby/strategies/strategy.py
22 23 24 |
|
Ns: NDArray[np.uint32]
property
abstractmethod
The number of times each arm has been played.
Qs: NDArray[np.float64]
property
abstractmethod
The current estimated action values for each arm.
__repr__()
abstractmethod
Returns a string representation of the strategy.
Source code in mabby/strategies/strategy.py
26 27 28 |
|
agent(**kwargs)
Creates an agent following the strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
**kwargs |
str
|
Parameters for initializing the agent (see
|
{}
|
Returns:
Type | Description |
---|---|
Agent
|
The created agent with the strategy. |
Source code in mabby/strategies/strategy.py
70 71 72 73 74 75 76 77 78 79 80 |
|
choose(rng)
abstractmethod
Returns the next arm to play.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rng |
Generator
|
A random number generator. |
required |
Returns:
Type | Description |
---|---|
int
|
The index of the arm to play. |
Source code in mabby/strategies/strategy.py
39 40 41 42 43 44 45 46 47 48 |
|
prime(k, steps)
abstractmethod
Primes the strategy before running a trial.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k |
int
|
The number of bandit arms to choose from. |
required |
steps |
int
|
The number of steps to the simulation will be run. |
required |
Source code in mabby/strategies/strategy.py
30 31 32 33 34 35 36 37 |
|
update(choice, reward, rng=None)
abstractmethod
Updates internal parameter estimates based on reward observation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
choice |
int
|
The most recent choice made. |
required |
reward |
float
|
The observed reward from the agent's most recent choice. |
required |
rng |
Generator | None
|
A random number generator. |
None
|
Source code in mabby/strategies/strategy.py
50 51 52 53 54 55 56 57 58 |
|
UCB1Strategy(alpha)
Bases: Strategy
Strategy using the UCB1 bandit algorithm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
alpha |
float
|
The exploration parameter. |
required |
Source code in mabby/strategies/ucb.py
21 22 23 24 25 26 27 28 29 |
|