Borrar filtros
Borrar filtros

No convergence during training using an TD3 RL agent

11 visualizaciones (últimos 30 días)
Gaurav
Gaurav el 20 de Abr. de 2024
Respondida: Ayush Anand el 22 de Mayo de 2024
I am trying to train an agent to navigate a multirotor to a particular 3d coordinate. I am using an TD3 agent with the configuration same as the Train Biped Robot to Walk using Reinforcement Leaning agent ( Link: Train Biped Robot to Walk Using Reinforcement Learning Agents - MATLAB & Simulink (mathworks.com) ). In my case i have 16 observation space and 4 action space. I have normalized both my observation space before passing it to my agent and the action output by the agent is also normalized between -1 to 1 which i later scale it up while passing it to the multirotor environment.
While training the agent rewards drop to zero all the time. I have attached the training results as an image. In the image you can see that the rewards are between 1000 and zero and the rewards keep droping to zero and the agent can't maintain a constant high reward.
Also the agent is trained using parallel computing.

Respuestas (1)

Ayush Anand
Ayush Anand el 22 de Mayo de 2024
The reward continuously droppng to zero suggests that the agent might be struggling with either the complexity of the task, the design of the reward function, or issues related to the training setup. Here are a few potential reasons behind the same:
  1. Reward Function being inadequate: If the reward is sparse ,i.e, infrequent feedback to the agent or the reward scale is inappropriate, the agent will fail to learn properly. Ensure that the reward is shaped properly.
  2. Exploration Strategy : As TD3 benefits from a noise-based exploration strategy make sure that the exploration noise is appropriately scaled so that exploration is smooth without causing erratic behavior.
  3. Learning Parameters: You could try experimenting with learning rates for the actor and critic networks, as well as with different batch sizes and replay buffer capacities. You could also try adjusting the discount factor (gamma) and target update frequency.
You can refer to the following links to explore different options with the "rlTD3" agent in MATLAB:
  1. https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rltd3agent.html
  2. https://www.mathworks.com/help/reinforcement-learning/ref/rl.option.rltd3agentoptions.html

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by