image thumbnail

10- Armed Bandit Test bed using greedy algorithm

version (1.35 KB) by Sai Sandeep Damera
This is a script to create a 10 armed bandit testbed using Greedy algorithm


Updated 12 Mar 2018

View License

This was a set of 2000 randomly generated k-armed bandit
problems with k = 10. For each bandit problem, the action values,
q*(a), a = 1,2 .... 10, were selected according to a normal (Gaussian) distribution with mean 0 and
variance 1. Then, when a learning method applied to that problem selected action At at time step t,
the actual reward, Rt, was selected from a normal distribution with mean q*(At) and variance 1.
For any learning method, we can measure its performance and behavior as it improves with experience over
1000 time steps when applied to one of the bandit problems. This makes up one run. Repeating this
for 2000 independent runs, each with a different bandit problem, we obtained measures of the learning
algorithm's average behavior.
We use the sample average technique for action-value estimates and compare the results of a greedy algorithm by plotting the average reward over 2000 simulations. The code can be modified for a non-greedy algorithm as well.

Cite As

Sai Sandeep Damera (2022). 10- Armed Bandit Test bed using greedy algorithm (, MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2017b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!