Skip to content

strategies

Multi-armed bandit strategies.

mabby provides a collection of preset bandit strategies that can be plugged into simulations. The Strategy abstract base class can also be sub-classed to implement custom bandit strategies.

BetaTSStrategy(general=False)

Bases: Strategy

Thompson sampling strategy with Beta priors.

If general is False, rewards used for updates must be either 0 or 1. Otherwise, rewards must be with support [0, 1].

Parameters:

Name Type Description Default
general bool

Whether to use a generalized version of the strategy.

False
Source code in mabby/strategies/thompson.py
21
22
23
24
25
26
27
28
29
30
def __init__(self, general: bool = False):
    """Initializes a Beta Thompson sampling strategy.

    If ``general`` is ``False``, rewards used for updates must be either 0 or 1.
    Otherwise, rewards must be with support [0, 1].

    Args:
        general: Whether to use a generalized version of the strategy.
    """
    self.general = general

EpsilonFirstStrategy(eps)

Bases: SemiUniformStrategy

Epsilon-first bandit strategy.

The epsilon-first strategy has a pure exploration phase followed by a pure exploitation phase.

Parameters:

Name Type Description Default
eps float

The ratio of exploration steps (must be between 0 and 1).

required
Source code in mabby/strategies/semi_uniform.py
132
133
134
135
136
137
138
139
140
141
def __init__(self, eps: float) -> None:
    """Initializes an epsilon-first strategy.

    Args:
        eps: The ratio of exploration steps (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

EpsilonGreedyStrategy(eps)

Bases: SemiUniformStrategy

Epsilon-greedy bandit strategy.

The epsilon-greedy strategy has a fixed chance of exploration every time step.

Parameters:

Name Type Description Default
eps float

The chance of exploration (must be between 0 and 1).

required
Source code in mabby/strategies/semi_uniform.py
103
104
105
106
107
108
109
110
111
112
def __init__(self, eps: float) -> None:
    """Initializes an epsilon-greedy strategy.

    Args:
        eps: The chance of exploration (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

RandomStrategy()

Bases: SemiUniformStrategy

Random bandit strategy.

The random strategy chooses arms at random, i.e., it explores with 100% chance.

Source code in mabby/strategies/semi_uniform.py
84
85
86
def __init__(self) -> None:
    """Initializes a random strategy."""
    super().__init__()

SemiUniformStrategy()

Bases: Strategy, ABC, EnforceOverrides

Base class for semi-uniform bandit strategies.

Every semi-uniform strategy must implement effective_eps to compute the chance of exploration at each time step.

Source code in mabby/strategies/semi_uniform.py
33
34
def __init__(self) -> None:
    """Initializes a semi-uniform strategy."""

effective_eps() abstractmethod

Returns the effective epsilon value.

The effective epsilon value is the probability at the current time step that the bandit will explore rather than exploit. Depending on the strategy, the effective epsilon value may be different from the nominal epsilon value set.

Source code in mabby/strategies/semi_uniform.py
68
69
70
71
72
73
74
75
@abstractmethod
def effective_eps(self) -> float:
    """Returns the effective epsilon value.

    The effective epsilon value is the probability at the current time step that the
    bandit will explore rather than exploit. Depending on the strategy, the
    effective epsilon value may be different from the nominal epsilon value set.
    """

Strategy()

Bases: ABC, EnforceOverrides

Base class for a bandit strategy.

A strategy provides the computational logic for choosing which bandit arms to play and updating parameter estimates.

Source code in mabby/strategies/strategy.py
22
23
24
@abstractmethod
def __init__(self) -> None:
    """Initializes a bandit strategy."""

Ns: NDArray[np.uint32] property abstractmethod

The number of times each arm has been played.

Qs: NDArray[np.float64] property abstractmethod

The current estimated action values for each arm.

__repr__() abstractmethod

Returns a string representation of the strategy.

Source code in mabby/strategies/strategy.py
26
27
28
@abstractmethod
def __repr__(self) -> str:
    """Returns a string representation of the strategy."""

agent(**kwargs)

Creates an agent following the strategy.

Parameters:

Name Type Description Default
**kwargs str

Parameters for initializing the agent (see Agent)

{}

Returns:

Type Description
Agent

The created agent with the strategy.

Source code in mabby/strategies/strategy.py
70
71
72
73
74
75
76
77
78
79
80
def agent(self, **kwargs: str) -> Agent:
    """Creates an agent following the strategy.

    Args:
        **kwargs: Parameters for initializing the agent (see
            [`Agent`][mabby.agent.Agent])

    Returns:
        The created agent with the strategy.
    """
    return Agent(strategy=self, **kwargs)

choose(rng) abstractmethod

Returns the next arm to play.

Parameters:

Name Type Description Default
rng Generator

A random number generator.

required

Returns:

Type Description
int

The index of the arm to play.

Source code in mabby/strategies/strategy.py
39
40
41
42
43
44
45
46
47
48
@abstractmethod
def choose(self, rng: Generator) -> int:
    """Returns the next arm to play.

    Args:
        rng: A random number generator.

    Returns:
        The index of the arm to play.
    """

prime(k, steps) abstractmethod

Primes the strategy before running a trial.

Parameters:

Name Type Description Default
k int

The number of bandit arms to choose from.

required
steps int

The number of steps to the simulation will be run.

required
Source code in mabby/strategies/strategy.py
30
31
32
33
34
35
36
37
@abstractmethod
def prime(self, k: int, steps: int) -> None:
    """Primes the strategy before running a trial.

    Args:
        k: The number of bandit arms to choose from.
        steps: The number of steps to the simulation will be run.
    """

update(choice, reward, rng=None) abstractmethod

Updates internal parameter estimates based on reward observation.

Parameters:

Name Type Description Default
choice int

The most recent choice made.

required
reward float

The observed reward from the agent's most recent choice.

required
rng Generator | None

A random number generator.

None
Source code in mabby/strategies/strategy.py
50
51
52
53
54
55
56
57
58
@abstractmethod
def update(self, choice: int, reward: float, rng: Generator | None = None) -> None:
    """Updates internal parameter estimates based on reward observation.

    Args:
        choice: The most recent choice made.
        reward: The observed reward from the agent's most recent choice.
        rng: A random number generator.
    """

UCB1Strategy(alpha)

Bases: Strategy

Strategy using the UCB1 bandit algorithm.

Parameters:

Name Type Description Default
alpha float

The exploration parameter.

required
Source code in mabby/strategies/ucb.py
21
22
23
24
25
26
27
28
29
def __init__(self, alpha: float) -> None:
    """Initializes a UCB1 strategy.

    Args:
        alpha: The exploration parameter.
    """
    if alpha < 0:
        raise ValueError("alpha must be greater than 0")
    self.alpha = alpha