Skip to content

strategy

Provides Strategy class.

Strategy()

Bases: ABC, EnforceOverrides

Base class for a bandit strategy.

A strategy provides the computational logic for choosing which bandit arms to play and updating parameter estimates.

Source code in mabby/strategies/strategy.py
22
23
24
@abstractmethod
def __init__(self) -> None:
    """Initializes a bandit strategy."""

Ns: NDArray[np.uint32] property abstractmethod

The number of times each arm has been played.

Qs: NDArray[np.float64] property abstractmethod

The current estimated action values for each arm.

__repr__() abstractmethod

Returns a string representation of the strategy.

Source code in mabby/strategies/strategy.py
26
27
28
@abstractmethod
def __repr__(self) -> str:
    """Returns a string representation of the strategy."""

agent(**kwargs)

Creates an agent following the strategy.

Parameters:

Name Type Description Default
**kwargs str

Parameters for initializing the agent (see Agent)

{}

Returns:

Type Description
Agent

The created agent with the strategy.

Source code in mabby/strategies/strategy.py
70
71
72
73
74
75
76
77
78
79
80
def agent(self, **kwargs: str) -> Agent:
    """Creates an agent following the strategy.

    Args:
        **kwargs: Parameters for initializing the agent (see
            [`Agent`][mabby.agent.Agent])

    Returns:
        The created agent with the strategy.
    """
    return Agent(strategy=self, **kwargs)

choose(rng) abstractmethod

Returns the next arm to play.

Parameters:

Name Type Description Default
rng Generator

A random number generator.

required

Returns:

Type Description
int

The index of the arm to play.

Source code in mabby/strategies/strategy.py
39
40
41
42
43
44
45
46
47
48
@abstractmethod
def choose(self, rng: Generator) -> int:
    """Returns the next arm to play.

    Args:
        rng: A random number generator.

    Returns:
        The index of the arm to play.
    """

prime(k, steps) abstractmethod

Primes the strategy before running a trial.

Parameters:

Name Type Description Default
k int

The number of bandit arms to choose from.

required
steps int

The number of steps to the simulation will be run.

required
Source code in mabby/strategies/strategy.py
30
31
32
33
34
35
36
37
@abstractmethod
def prime(self, k: int, steps: int) -> None:
    """Primes the strategy before running a trial.

    Args:
        k: The number of bandit arms to choose from.
        steps: The number of steps to the simulation will be run.
    """

update(choice, reward, rng=None) abstractmethod

Updates internal parameter estimates based on reward observation.

Parameters:

Name Type Description Default
choice int

The most recent choice made.

required
reward float

The observed reward from the agent's most recent choice.

required
rng Generator | None

A random number generator.

None
Source code in mabby/strategies/strategy.py
50
51
52
53
54
55
56
57
58
@abstractmethod
def update(self, choice: int, reward: float, rng: Generator | None = None) -> None:
    """Updates internal parameter estimates based on reward observation.

    Args:
        choice: The most recent choice made.
        reward: The observed reward from the agent's most recent choice.
        rng: A random number generator.
    """