strategies

Multi-armed bandit strategies.

mabby provides a collection of preset bandit strategies that can be plugged into simulations. The Strategy abstract base class can also be sub-classed to implement custom bandit strategies.

`BetaTSStrategy(general=False)`

Bases: Strategy

Thompson sampling strategy with Beta priors.

If general is False, rewards used for updates must be either 0 or 1. Otherwise, rewards must be with support [0, 1].

Parameters:

Name	Type	Description	Default
`general`	`bool`	Whether to use a generalized version of the strategy.	`False`

Source code in mabby/strategies/thompson.py

def __init__(self, general: bool = False):
    """Initializes a Beta Thompson sampling strategy.

    If ``general`` is ``False``, rewards used for updates must be either 0 or 1.
    Otherwise, rewards must be with support [0, 1].

    Args:
        general: Whether to use a generalized version of the strategy.
    """
    self.general = general

`EpsilonFirstStrategy(eps)`

Bases: SemiUniformStrategy

Epsilon-first bandit strategy.

The epsilon-first strategy has a pure exploration phase followed by a pure exploitation phase.

Parameters:

Name	Type	Description	Default
`eps`	`float`	The ratio of exploration steps (must be between 0 and 1).	required

Source code in mabby/strategies/semi_uniform.py

def __init__(self, eps: float) -> None:
    """Initializes an epsilon-first strategy.

    Args:
        eps: The ratio of exploration steps (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

`EpsilonGreedyStrategy(eps)`

Bases: SemiUniformStrategy

Epsilon-greedy bandit strategy.

The epsilon-greedy strategy has a fixed chance of exploration every time step.

Parameters:

Name	Type	Description	Default
`eps`	`float`	The chance of exploration (must be between 0 and 1).	required

Source code in mabby/strategies/semi_uniform.py

def __init__(self, eps: float) -> None:
    """Initializes an epsilon-greedy strategy.

    Args:
        eps: The chance of exploration (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

`RandomStrategy()`

Bases: SemiUniformStrategy

Random bandit strategy.

The random strategy chooses arms at random, i.e., it explores with 100% chance.

Source code in mabby/strategies/semi_uniform.py

def __init__(self) -> None:
    """Initializes a random strategy."""
    super().__init__()

`SemiUniformStrategy()`

Bases: Strategy, ABC, EnforceOverrides

Base class for semi-uniform bandit strategies.

Every semi-uniform strategy must implement effective_eps to compute the chance of exploration at each time step.

Source code in mabby/strategies/semi_uniform.py

def __init__(self) -> None:
    """Initializes a semi-uniform strategy."""

`effective_eps()` `abstractmethod`

Returns the effective epsilon value.

The effective epsilon value is the probability at the current time step that the bandit will explore rather than exploit. Depending on the strategy, the effective epsilon value may be different from the nominal epsilon value set.

Source code in mabby/strategies/semi_uniform.py

@abstractmethod
def effective_eps(self) -> float:
    """Returns the effective epsilon value.

    The effective epsilon value is the probability at the current time step that the
    bandit will explore rather than exploit. Depending on the strategy, the
    effective epsilon value may be different from the nominal epsilon value set.
    """

`Strategy()`

Bases: ABC, EnforceOverrides

Base class for a bandit strategy.

A strategy provides the computational logic for choosing which bandit arms to play and updating parameter estimates.

Source code in mabby/strategies/strategy.py

@abstractmethod
def __init__(self) -> None:
    """Initializes a bandit strategy."""

`Ns: NDArray[np.uint32]` `property` `abstractmethod`

The number of times each arm has been played.

`Qs: NDArray[np.float64]` `property` `abstractmethod`

The current estimated action values for each arm.

`repr()` `abstractmethod`

Returns a string representation of the strategy.

Source code in mabby/strategies/strategy.py

@abstractmethod
def __repr__(self) -> str:
    """Returns a string representation of the strategy."""

`agent(**kwargs)`

Creates an agent following the strategy.

Parameters:

Name	Type	Description	Default
`**kwargs`	`str`	Parameters for initializing the agent (see `Agent`)	`{}`

Returns:

Type	Description
`Agent`	The created agent with the strategy.

Source code in mabby/strategies/strategy.py

def agent(self, **kwargs: str) -> Agent:
    """Creates an agent following the strategy.

    Args:
        **kwargs: Parameters for initializing the agent (see
            [`Agent`][mabby.agent.Agent])

    Returns:
        The created agent with the strategy.
    """
    return Agent(strategy=self, **kwargs)

`choose(rng)` `abstractmethod`

Returns the next arm to play.

Parameters:

Name	Type	Description	Default
`rng`	`Generator`	A random number generator.	required

Returns:

Type	Description
`int`	The index of the arm to play.

Source code in mabby/strategies/strategy.py

@abstractmethod
def choose(self, rng: Generator) -> int:
    """Returns the next arm to play.

    Args:
        rng: A random number generator.

    Returns:
        The index of the arm to play.
    """

`prime(k, steps)` `abstractmethod`

Primes the strategy before running a trial.

Parameters:

Name	Type	Description	Default
`k`	`int`	The number of bandit arms to choose from.	required
`steps`	`int`	The number of steps to the simulation will be run.	required

Source code in mabby/strategies/strategy.py

@abstractmethod
def prime(self, k: int, steps: int) -> None:
    """Primes the strategy before running a trial.

    Args:
        k: The number of bandit arms to choose from.
        steps: The number of steps to the simulation will be run.
    """

`update(choice, reward, rng=None)` `abstractmethod`

Updates internal parameter estimates based on reward observation.

Parameters:

Name	Type	Description	Default
`choice`	`int`	The most recent choice made.	required
`reward`	`float`	The observed reward from the agent's most recent choice.	required
`rng`	`Generator \| None`	A random number generator.	`None`

Source code in mabby/strategies/strategy.py

@abstractmethod
def update(self, choice: int, reward: float, rng: Generator | None = None) -> None:
    """Updates internal parameter estimates based on reward observation.

    Args:
        choice: The most recent choice made.
        reward: The observed reward from the agent's most recent choice.
        rng: A random number generator.
    """

`UCB1Strategy(alpha)`

Bases: Strategy

Strategy using the UCB1 bandit algorithm.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	The exploration parameter.	required

Source code in mabby/strategies/ucb.py

def __init__(self, alpha: float) -> None:
    """Initializes a UCB1 strategy.

    Args:
        alpha: The exploration parameter.
    """
    if alpha < 0:
        raise ValueError("alpha must be greater than 0")
    self.alpha = alpha

strategies

BetaTSStrategy(general=False)

EpsilonFirstStrategy(eps)

EpsilonGreedyStrategy(eps)

RandomStrategy()

SemiUniformStrategy()

effective_eps() abstractmethod

Strategy()

Ns: NDArray[np.uint32] property abstractmethod

Qs: NDArray[np.float64] property abstractmethod

__repr__() abstractmethod

agent(**kwargs)

choose(rng) abstractmethod

prime(k, steps) abstractmethod

update(choice, reward, rng=None) abstractmethod

UCB1Strategy(alpha)

`BetaTSStrategy(general=False)`

`EpsilonFirstStrategy(eps)`

`EpsilonGreedyStrategy(eps)`

`RandomStrategy()`

`SemiUniformStrategy()`

`effective_eps()` `abstractmethod`

`Strategy()`

`Ns: NDArray[np.uint32]` `property` `abstractmethod`

`Qs: NDArray[np.float64]` `property` `abstractmethod`

`repr()` `abstractmethod`

`agent(**kwargs)`

`choose(rng)` `abstractmethod`

`prime(k, steps)` `abstractmethod`

`update(choice, reward, rng=None)` `abstractmethod`

`UCB1Strategy(alpha)`