Skip to content

agent

Provides Agent class for bandit simulations.

Agent(strategy, name=None)

Agent in a multi-armed bandit simulation.

An agent represents an autonomous entity in a bandit simulation. It wraps around a specified strategy and provides an interface for each part of the decision-making process, including making a choice then updating internal parameter estimates based on the observed rewards from that choice.

Parameters:

Name Type Description Default
strategy Strategy

The bandit strategy to use.

required
name str | None

An optional name for the agent.

None
Source code in mabby/agent.py
28
29
30
31
32
33
34
35
36
37
38
def __init__(self, strategy: Strategy, name: str | None = None):
    """Initializes an agent with a given strategy.

    Args:
        strategy: The bandit strategy to use.
        name: An optional name for the agent.
    """
    self.strategy: Strategy = strategy  #: The bandit strategy to use
    self._name = name
    self._primed = False
    self._choice: int | None = None

Ns: NDArray[np.uint32] property

The number of times the agent has played each arm.

The play counts are only available after the agent has been primed.

Returns:

Type Description
NDArray[np.uint32]

An array of the play counts of each arm.

Raises:

Type Description
AgentUsageError

If the agent has not been primed.

Qs: NDArray[np.float64] property

The agent's current estimated action values (Q-values).

The action values are only available after the agent has been primed.

Returns:

Type Description
NDArray[np.float64]

An array of the action values of each arm.

Raises:

Type Description
AgentUsageError

If the agent has not been primed.

__repr__()

Returns the agent's string representation.

Uses the agent's name if set. Otherwise, the string representation of the agent's strategy is used by default.

Source code in mabby/agent.py
40
41
42
43
44
45
46
47
48
def __repr__(self) -> str:
    """Returns the agent's string representation.

    Uses the agent's name if set. Otherwise, the string representation of the
    agent's strategy is used by default.
    """
    if self._name is None:
        return str(self.strategy)
    return self._name

choose()

Returns the agent's next choice based on its strategy.

This method can only be called on a primed agent.

Returns:

Type Description
int

The index of the arm chosen by the agent.

Raises:

Type Description
AgentUsageError

If the agent has not been primed.

Source code in mabby/agent.py
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def choose(self) -> int:
    """Returns the agent's next choice based on its strategy.

    This method can only be called on a primed agent.

    Returns:
        The index of the arm chosen by the agent.

    Raises:
        AgentUsageError: If the agent has not been primed.
    """
    if not self._primed:
        raise AgentUsageError("choose() can only be called on a primed agent")
    self._choice = self.strategy.choose(self._rng)
    return self._choice

prime(k, steps, rng)

Primes the agent before running a trial.

Parameters:

Name Type Description Default
k int

The number of bandit arms for the agent to choose from.

required
steps int

The number of steps to the simulation will be run.

required
rng Generator

A random number generator.

required
Source code in mabby/agent.py
50
51
52
53
54
55
56
57
58
59
60
61
def prime(self, k: int, steps: int, rng: Generator) -> None:
    """Primes the agent before running a trial.

    Args:
        k: The number of bandit arms for the agent to choose from.
        steps: The number of steps to the simulation will be run.
        rng: A random number generator.
    """
    self._primed = True
    self._choice = None
    self._rng = rng
    self.strategy.prime(k, steps)

update(reward)

Updates the agent's internal parameter estimates.

This method can only be called if the agent has previously made a choice, and an update based on that choice has not already been made.

Parameters:

Name Type Description Default
reward float

The observed reward from the agent's most recent choice.

required

Raises:

Type Description
AgentUsageError

If the agent has not previously made a choice.

Source code in mabby/agent.py
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def update(self, reward: float) -> None:
    """Updates the agent's internal parameter estimates.

    This method can only be called if the agent has previously made a choice, and
    an update based on that choice has not already been made.

    Args:
        reward: The observed reward from the agent's most recent choice.

    Raises:
        AgentUsageError: If the agent has not previously made a choice.
    """
    if self._choice is None:
        raise AgentUsageError("update() can only be called after choose()")
    self.strategy.update(self._choice, reward, self._rng)
    self._choice = None