Bernoulli Bandits

We will walk through an example using mabby to run a classic "Bernoulli bandits" simulation.

from mabby import BernoulliArm, Bandit, Metric, Simulation
from mabby.strategies import BetaTSStrategy, EpsilonGreedyStrategy, UCB1Strategy

Configuring bandit arms

First, to set up our simulation, let us start by configuring our multi-armed bandit. We want to simulate a 3-armed bandit where the rewards of each arm follow Bernoulli distributions with p of 0.5, 0.6, and 0.7 respectively.

ps = [0.5, 0.6, 0.7]

We create a BernoulliArm for each arm, then create a Bandit using the list of arms.

arms = [BernoulliArm(p) for p in ps]
bandit = Bandit(arms=arms)

Because all our arms are of the same type (i.e., their rewards follow the same type of distribution), we can also use the equivalent shorthand below to create the bandit.

bandit = BernoulliArm.bandit(p=ps)

Configuring bandit strategies

Next, we need to configure the strategies we want to simulate on the bandit we just created. We will compare between three strategies:

epsilon-greedy algorithm (EpsilonGreedyStrategy)
upper confidence bound (UCB1) algorithm (UCB1Strategy)
Thompson sampling with Beta priors (BetaTSStrategy)

We create each of the strategies with the appropriate hyperparameters.

strategy_1 = EpsilonGreedyStrategy(eps=0.2)
strategy_2 = UCB1Strategy(alpha=0.5)
strategy_3 = BetaTSStrategy(general=True)

strategies = [strategy_1, strategy_2, strategy_3]

Running a simulation

Now, we can set up a simulation and run it. We first create a Simulation with our bandit and strategies.

simulation = Simulation(
    bandit=bandit, strategies=strategies, names=["eps-greedy", "ucb1", "thompson"]
)

Then, we run our simulation for 100 trials of 300 steps each. We also specify that we want to collect statistics on the optimality (Metric.OPTIMALITY) and cumulative regret (Metric.CUM_REGRET) for each of the strategies. Running the simulation outputs a SimulationStats object holding the statistics we requested.

metrics = [Metric.OPTIMALITY, Metric.CUM_REGRET]
stats = simulation.run(trials=100, steps=300, metrics=metrics)

Visualizing simulation statistics

After running our simulation, we can visualize the statistics we collected by calling various plotting methods.

stats.plot_optimality()

png

stats.plot_regret(cumulative=True)

png