agent
Provides Agent
class for bandit simulations.
Agent(strategy, name=None)
Agent in a multi-armed bandit simulation.
An agent represents an autonomous entity in a bandit simulation. It wraps around a specified strategy and provides an interface for each part of the decision-making process, including making a choice then updating internal parameter estimates based on the observed rewards from that choice.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
strategy |
Strategy
|
The bandit strategy to use. |
required |
name |
str | None
|
An optional name for the agent. |
None
|
Source code in mabby/agent.py
28 29 30 31 32 33 34 35 36 37 38 |
|
Ns: NDArray[np.uint32]
property
The number of times the agent has played each arm.
The play counts are only available after the agent has been primed.
Returns:
Type | Description |
---|---|
NDArray[np.uint32]
|
An array of the play counts of each arm. |
Raises:
Type | Description |
---|---|
AgentUsageError
|
If the agent has not been primed. |
Qs: NDArray[np.float64]
property
The agent's current estimated action values (Q-values).
The action values are only available after the agent has been primed.
Returns:
Type | Description |
---|---|
NDArray[np.float64]
|
An array of the action values of each arm. |
Raises:
Type | Description |
---|---|
AgentUsageError
|
If the agent has not been primed. |
__repr__()
Returns the agent's string representation.
Uses the agent's name if set. Otherwise, the string representation of the agent's strategy is used by default.
Source code in mabby/agent.py
40 41 42 43 44 45 46 47 48 |
|
choose()
Returns the agent's next choice based on its strategy.
This method can only be called on a primed agent.
Returns:
Type | Description |
---|---|
int
|
The index of the arm chosen by the agent. |
Raises:
Type | Description |
---|---|
AgentUsageError
|
If the agent has not been primed. |
Source code in mabby/agent.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
prime(k, steps, rng)
Primes the agent before running a trial.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k |
int
|
The number of bandit arms for the agent to choose from. |
required |
steps |
int
|
The number of steps to the simulation will be run. |
required |
rng |
Generator
|
A random number generator. |
required |
Source code in mabby/agent.py
50 51 52 53 54 55 56 57 58 59 60 61 |
|
update(reward)
Updates the agent's internal parameter estimates.
This method can only be called if the agent has previously made a choice, and an update based on that choice has not already been made.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reward |
float
|
The observed reward from the agent's most recent choice. |
required |
Raises:
Type | Description |
---|---|
AgentUsageError
|
If the agent has not previously made a choice. |
Source code in mabby/agent.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|