Skip to content

semi_uniform

Provides implementations of semi-uniform bandit strategies.

Semi-uniform strategies will choose to explore or exploit at each time step. When exploring, a random arm will be played. When exploiting, the arm with the greatest estimated action value will be played. epsilon, the chance of exploration, is computed differently with different semi-uniform strategies.

EpsilonFirstStrategy(eps)

Bases: SemiUniformStrategy

Epsilon-first bandit strategy.

The epsilon-first strategy has a pure exploration phase followed by a pure exploitation phase.

Parameters:

Name Type Description Default
eps float

The ratio of exploration steps (must be between 0 and 1).

required
Source code in mabby/strategies/semi_uniform.py
132
133
134
135
136
137
138
139
140
141
def __init__(self, eps: float) -> None:
    """Initializes an epsilon-first strategy.

    Args:
        eps: The ratio of exploration steps (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

EpsilonGreedyStrategy(eps)

Bases: SemiUniformStrategy

Epsilon-greedy bandit strategy.

The epsilon-greedy strategy has a fixed chance of exploration every time step.

Parameters:

Name Type Description Default
eps float

The chance of exploration (must be between 0 and 1).

required
Source code in mabby/strategies/semi_uniform.py
103
104
105
106
107
108
109
110
111
112
def __init__(self, eps: float) -> None:
    """Initializes an epsilon-greedy strategy.

    Args:
        eps: The chance of exploration (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

RandomStrategy()

Bases: SemiUniformStrategy

Random bandit strategy.

The random strategy chooses arms at random, i.e., it explores with 100% chance.

Source code in mabby/strategies/semi_uniform.py
84
85
86
def __init__(self) -> None:
    """Initializes a random strategy."""
    super().__init__()

SemiUniformStrategy()

Bases: Strategy, ABC, EnforceOverrides

Base class for semi-uniform bandit strategies.

Every semi-uniform strategy must implement effective_eps to compute the chance of exploration at each time step.

Source code in mabby/strategies/semi_uniform.py
33
34
def __init__(self) -> None:
    """Initializes a semi-uniform strategy."""

effective_eps() abstractmethod

Returns the effective epsilon value.

The effective epsilon value is the probability at the current time step that the bandit will explore rather than exploit. Depending on the strategy, the effective epsilon value may be different from the nominal epsilon value set.

Source code in mabby/strategies/semi_uniform.py
68
69
70
71
72
73
74
75
@abstractmethod
def effective_eps(self) -> float:
    """Returns the effective epsilon value.

    The effective epsilon value is the probability at the current time step that the
    bandit will explore rather than exploit. Depending on the strategy, the
    effective epsilon value may be different from the nominal epsilon value set.
    """