Reinforcement learning - DDPG - minibatch - Continuos action saturation

Question

Oscar Emilio Aponte Rengifo el 10 de Nov. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1583354-reinforcement-learning-ddpg-minibatch-continuos-action-saturation

Respondida: Yash el 21 de Feb. de 2024

what is the influence of the minibatch? since in this case its value determines the moment in which the continuous action signal begins to saturate (it only adopts the values of the limits-blue line). In the case in which the value of the minibatch is very large, the weights of the nets do not change over time. Action[-0.15 0.15],

I don't know why before the minibatch which is when it starts from random weights it does not saturate but it is the value of the minibatch that determines where the saturation begins

data:

GradientThreesHold: 1

Learnrate: 0.03

agentOpts.NoiseOptions.StandardDeviation = 0.001;

agentOpts.NoiseOptions.StandardDeviationDecayRate =0.00001;

minibatch: 100 --- x axis=500

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Yash el 21 de Feb. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1583354-reinforcement-learning-ddpg-minibatch-continuos-action-saturation#answer_1413503

The minibatch size in a neural network training process refers to the number of samples used in each iteration to update the weights of the network. A larger minibatch size can lead to slower weight updates and potentially slower convergence, while a smaller minibatch size can result in faster weight updates but with more noisy updates.

In your case, it seems that a larger minibatch size is causing the action signal to saturate earlier. This could be because with a larger minibatch size, the weight updates are slower, and therefore the network takes longer to adapt to the changing environment. As a result, the action signal reaches its limits ([-0.15, 0.15]) earlier. It may be worth experimenting with different minibatch sizes to find the optimal value for your specific case.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Reinforcement learning - DDPG - minibatch - Continuos action saturation

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

Reinforcement learning - DDPG - minibatch - Continuos action saturation

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos