No convergence during training using an TD3 RL agent

Question

Gaurav el 20 de Abr. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2109391-no-convergence-during-training-using-an-td3-rl-agent

Respondida: Ayush Anand el 22 de Mayo de 2024

I am trying to train an agent to navigate a multirotor to a particular 3d coordinate. I am using an TD3 agent with the configuration same as the Train Biped Robot to Walk using Reinforcement Leaning agent ( Link: Train Biped Robot to Walk Using Reinforcement Learning Agents - MATLAB & Simulink (mathworks.com) ). In my case i have 16 observation space and 4 action space. I have normalized both my observation space before passing it to my agent and the action output by the agent is also normalized between -1 to 1 which i later scale it up while passing it to the multirotor environment.

While training the agent rewards drop to zero all the time. I have attached the training results as an image. In the image you can see that the rewards are between 1000 and zero and the rewards keep droping to zero and the agent can't maintain a constant high reward.

Also the agent is trained using parallel computing.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Ayush Anand el 22 de Mayo de 2024

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2109391-no-convergence-during-training-using-an-td3-rl-agent#answer_1461461

The reward continuously droppng to zero suggests that the agent might be struggling with either the complexity of the task, the design of the reward function, or issues related to the training setup. Here are a few potential reasons behind the same:

Reward Function being inadequate: If the reward is sparse ,i.e, infrequent feedback to the agent or the reward scale is inappropriate, the agent will fail to learn properly. Ensure that the reward is shaped properly.
Exploration Strategy : As TD3 benefits from a noise-based exploration strategy make sure that the exploration noise is appropriately scaled so that exploration is smooth without causing erratic behavior.
Learning Parameters: You could try experimenting with learning rates for the actor and critic networks, as well as with different batch sizes and replay buffer capacities. You could also try adjusting the discount factor (gamma) and target update frequency.

You can refer to the following links to explore different options with the "rlTD3" agent in MATLAB:

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

No convergence during training using an TD3 RL agent

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

No convergence during training using an TD3 RL agent

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos