This was a set of 2000 randomly generated k-armed bandit
problems with k = 10. For each bandit problem, the action values,
q*(a), a = 1,2 .... 10, were selected according to a normal (Gaussian) distribution with mean 0 and
variance 1. Then, when a learning method applied to that problem selected action At at time step t,
the actual reward, Rt, was selected from a normal distribution with mean q*(At) and variance 1.
For any learning method, we can measure its performance and behavior as it improves with experience over
1000 time steps when applied to one of the bandit problems. This makes up one run. Repeating this
for 2000 independent runs, each with a different bandit problem, we obtained measures of the learning
algorithm's average behavior.
We use the sample average technique for action-value estimates and compare the results of a greedy algorithm by plotting the average reward over 2000 simulations. The code can be modified for a non-greedy algorithm as well.
Sai Sandeep Damera (2021). 10- Armed Bandit Test bed using greedy algorithm (https://www.mathworks.com/matlabcentral/fileexchange/66467-10-armed-bandit-test-bed-using-greedy-algorithm), MATLAB Central File Exchange. Retrieved .
MATLAB Release Compatibility
Platform CompatibilityWindows macOS Linux
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!