RL Toolbox: DQN epsilon greedy exploration with epsilon=1 does not act random

3 visualizaciones (últimos 30 días)

Mostrar comentarios más antiguos

Tobias Schindler el 25 de En. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/726208-rl-toolbox-dqn-epsilon-greedy-exploration-with-epsilon-1-does-not-act-random

Comentada: Tobias Schindler el 5 de Oct. de 2021

Respuesta aceptada: Emmanouil Tzorakoleftherakis

Abrir en MATLAB Online

Setup:

Costum Simulink Environment
DQN Agent

To get a baseline of the environment I started training with a DQN Agent with:

opt.EpsilonGreedyExploration.Epsilon=1;
opt.EpsilonGreedyExploration.EpsilonDecay=0.0;
opt.EpsilonGreedyExploration.EpsilonMin=1;

This means that the Agent should not exploit the greedy action at all.

Sstated by the documentation (https://de.mathworks.com/help/reinforcement-learning/ug/dqn-agents.html):

During each control interval, the agent either selects a random action with probability ϵ or selects an action greedily with respect to the value function with probability 1-ϵ.

--> Epsilon=1 means probability of zero to have the greedy agent. It is not clearly stated how the random action is sampled, but it should be uniform.

Now with the above setting, the DQN Agent should never exploit the greedy policy during training. However, when starting the Simulation and watching the output of the episodes, it is clear that the Agent does in fact exploit the policy and does not act random.

What is going on here? Why does the agent not act random during training?
Is the sampling of the actions uniform? (Not related to the epsilon=1 behavior)
When exactly is the decay executed? I think i read somewhere in the doc that it is every training step, i.e., for DQN every time step of the simulation with the SampleTime of rlDQNAgentOptions? Would be handy to just have this information clearly stated in the part of the doc that expxlains epsilon greedy

I quite like the toolbox so far, there are just some implementation details that are a bit hard to grasp,i.e., its not 100 % clear to me how it is done by MATLAB.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuesta aceptada

Emmanouil Tzorakoleftherakis el 9 de Feb. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/726208-rl-toolbox-dqn-epsilon-greedy-exploration-with-epsilon-1-does-not-act-random#answer_619257

Editada: Emmanouil Tzorakoleftherakis el 9 de Feb. de 2021

Hello,

Maybe I misread the question, but you are saying "when starting the Simulation and watching the output of the episodes...". Just to clarify, if you hit the "play" button in Simulink or if you use the "sim" command, exploration is out of the picture - Simulink will only do inference on the agent. Exploration is used only when you call "train".

To your other question, sampling in DQN is indeed uniform for exploration

8 comentarios
Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos

Tobias Schindler el 9 de Feb. de 2021

Thanks for checking! I'll try to reproduce it with the example as well and check for differences between my original model and the example.

Will be reporting back!

Tobias Schindler el 5 de Oct. de 2021

Forgot about this question and I did not encounter this problem anymore in other models / setups, not sure what the problem was.

Iniciar sesión para comentar.

Más respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

AI and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

Más información sobre Reinforcement Learning en Help Center y File Exchange.

Productos

Reinforcement Learning Toolbox

Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

RL Toolbox: DQN epsilon greedy exploration with epsilon=1 does not act random

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

8 comentarios
Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

RL Toolbox: DQN epsilon greedy exploration with epsilon=1 does not act random

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

8 comentarios Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

8 comentarios
Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos