How to set DQN network to approach Q0 ?

Question

Matteo Padovani el 5 de Mzo. de 2021

1
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/763566-how-to-set-dqn-network-to-approach-q0

Comentada: Emmanouil Tzorakoleftherakis el 6 de Mzo. de 2021

Respuesta aceptada: Emmanouil Tzorakoleftherakis

I'm using the reinforcement learning toolbox to design and train a DQN agent.

My action space is a discrete action space composed by 24 elements. The network is a convolutional neural network inspired by the dueling DQN architecture, these are my convolutional layers:

Layers = [
    imageInputLayer([h w 5],"Name","imageinput","Normalization","none")
    convolution2dLayer([8 8],64,"Name","conv_1","Padding","same","Stride",[4 4])
    reluLayer("Name","relu1")
    convolution2dLayer([8 8],64,"Name","conv_1.2","Padding","same")
    reluLayer("Name","relu1.2")
    convolution2dLayer([4 4],128,"Name","conv_2","Padding","same","Stride",[2 2])
    reluLayer("Name","relu2")
    convolution2dLayer([4 4],128,"Name","conv_2.2","Padding","same")
    reluLayer("Name","relu2.2")
    convolution2dLayer([4 4],128,"Name","conv_2.3","Padding","same")
    reluLayer("Name","relu2.3")
    convolution2dLayer([3 3],256,"Name","conv_3","Padding","same","Stride",[2 2])
    reluLayer("Name","relu3")
    convolution2dLayer([3 3],256,"Name","conv_3.2","Padding","same")
    reluLayer("Name","relu3.2")
    convolution2dLayer([3 3],256,"Name","conv_3.3","Padding","same")
    reluLayer("Name","relu3.3")];

I'm obtaining bad results during training in the sense that the reward continuously oscillates in the same range of values without improving as if the agent is not learning. Moreover at some point the episode Q0 diverges drastically, I've found on the documentation that: "For agents with a critic, Episode Q0 is the estimate of the discounted long-term reward at the start of each episode, given the initial observation of the environment. As training progresses, if the critic is well designed. Episode Q0 approaches the true discounted long-term reward, as shown in the preceding figure."

Therefore my question is the following : in order to make the Q0 approach the episode reward values how can I modify my network? May be there other problems?

The parameters I'm using for the agent are the following:

criticOpts.Optimizer = 'adam';
criticOpts.LearnRate = 0.00025;
agentOpts.UseDoubleDQN = true;
agentOpts.ExperienceBufferLength = 1e6;
agentOpts.NumStepsToLookAhead = 1;
agentOpts.DiscountFactor = 0.99;
agentOpts.EpsilonGreedyExploration.Epsilon = 1;
agentOpts.EpsilonGreedyExploration.EpsilonMin = 0.1;
agentOpts.EpsilonGreedyExploration.EpsilonDecay = 0.01;
agentOpts.MiniBatchSize = 64;
agentOpts.TargetUpdateMethod = 'smoothing';
agentOpts.TargetUpdateFrequency = 1;
agentOpts.TargetSmoothFactor = 1e-3;

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Emmanouil Tzorakoleftherakis el 5 de Mzo. de 2021

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/763566-how-to-set-dqn-network-to-approach-q0#answer_639981

There is no single answer here that will get the training to work. My first instict would be to go for a simpler architecture without convolutional layers, get some result that makes sense (so that you get an idea which hyperparameters are working) and then move to the dueling DQN architecture.

You would still need to experiment with hyperparams though. First off, reduce the epsilon decay rate to let the agent explore more, and then play with experience buffer length and mini-batch size (the other params can be left at their default values initially).

A couple of things to keep in mind: 1) As of R2020b release, the default agent lets you create an agent just by providing observation and action info (so no need to create neural network architectures yourself). Take a look here. 2) If you still want to create your own network, make sure you use the multi-output critic architecture.

When you get to the point where you can move to dueling DQN, I would start with some architectures published in papers. For example, at first glance the architecture you are showing seems to have a lot of conv layers - did you see this in some paper? This paper whicn talks about the same topic may be a good place to start.

Hope this helps

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Matteo Padovani el 6 de Mzo. de 2021

It was really helpfull since the documentation was not clear for me.

I have a very last question, I thought of using the architecture you suggested but there is the issue of summing the Value estimate with the Advantage values in order to obtain the Q values since the network has to output them in order to define the agent, the only way of doing that is to create a custom layer that performs the averaged summation?

And again thaks a lot.

Emmanouil Tzorakoleftherakis el 6 de Mzo. de 2021

You can use the additionLayer - Here is an example that shows how to use it to create a critic. As I mentioned, I would start with something simple, that does not consider the advantage estimation.

Iniciar sesión para comentar.

How to set DQN network to approach Q0 ?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

How to set DQN network to approach Q0 ?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos