semi_uniform

Provides implementations of semi-uniform bandit strategies.

Semi-uniform strategies will choose to explore or exploit at each time step. When exploring, a random arm will be played. When exploiting, the arm with the greatest estimated action value will be played. epsilon, the chance of exploration, is computed differently with different semi-uniform strategies.

`EpsilonFirstStrategy(eps)`

Bases: SemiUniformStrategy

Epsilon-first bandit strategy.

The epsilon-first strategy has a pure exploration phase followed by a pure exploitation phase.

Parameters:

Name	Type	Description	Default
`eps`	`float`	The ratio of exploration steps (must be between 0 and 1).	required

Source code in mabby/strategies/semi_uniform.py

def __init__(self, eps: float) -> None:
    """Initializes an epsilon-first strategy.

    Args:
        eps: The ratio of exploration steps (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

`EpsilonGreedyStrategy(eps)`

Bases: SemiUniformStrategy

Epsilon-greedy bandit strategy.

The epsilon-greedy strategy has a fixed chance of exploration every time step.

Parameters:

Name	Type	Description	Default
`eps`	`float`	The chance of exploration (must be between 0 and 1).	required

Source code in mabby/strategies/semi_uniform.py

def __init__(self, eps: float) -> None:
    """Initializes an epsilon-greedy strategy.

    Args:
        eps: The chance of exploration (must be between 0 and 1).
    """
    super().__init__()
    if eps < 0 or eps > 1:
        raise ValueError("eps must be between 0 and 1")
    self.eps = eps

`RandomStrategy()`

Bases: SemiUniformStrategy

Random bandit strategy.

The random strategy chooses arms at random, i.e., it explores with 100% chance.

Source code in mabby/strategies/semi_uniform.py

def __init__(self) -> None:
    """Initializes a random strategy."""
    super().__init__()

`SemiUniformStrategy()`

Bases: Strategy, ABC, EnforceOverrides

Base class for semi-uniform bandit strategies.

Every semi-uniform strategy must implement effective_eps to compute the chance of exploration at each time step.

Source code in mabby/strategies/semi_uniform.py

def __init__(self) -> None:
    """Initializes a semi-uniform strategy."""

`effective_eps()` `abstractmethod`

Returns the effective epsilon value.

The effective epsilon value is the probability at the current time step that the bandit will explore rather than exploit. Depending on the strategy, the effective epsilon value may be different from the nominal epsilon value set.

Source code in mabby/strategies/semi_uniform.py

@abstractmethod
def effective_eps(self) -> float:
    """Returns the effective epsilon value.

    The effective epsilon value is the probability at the current time step that the
    bandit will explore rather than exploit. Depending on the strategy, the
    effective epsilon value may be different from the nominal epsilon value set.
    """

semi_uniform

EpsilonFirstStrategy(eps)

EpsilonGreedyStrategy(eps)

RandomStrategy()

SemiUniformStrategy()

effective_eps() abstractmethod

`EpsilonFirstStrategy(eps)`

`EpsilonGreedyStrategy(eps)`

`RandomStrategy()`

`SemiUniformStrategy()`

`effective_eps()` `abstractmethod`