agent doesn't take different actions to different states

Question

Bryan el 21 de Jun. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2130956-agent-doesn-t-take-different-actions-to-different-states

Editada: Alan el 4 de Jul. de 2024

Hello everyone,

I have two issues:

I wasn't able to set up the environment so that the agent takes 24 different actions over the course of a day, meaning the agent takes one action every hour. As a workaround, I decided to train agents by the hour.
The second issue, which is the reason for my question, arises after training the agent. When I test the efficiency of its decision-making and run the simulation part of the RL Toolbox, I notice that the agent always takes the same action regardless of the state of the environment. This leads me to believe that the training process determines the best action for a set of states, which is not what I want. I want the agent to take the correct action for different states. I've been analyzing my environment code but can't figure out why the agent behaves this way.

Thank you in advance.

Bryan

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Umar el 22 de Jun. de 2024

Hi Bryan,

To address the first issue of setting up the environment for the agent to take 24 different actions per day, ensure that your action space is correctly defined to encompass all 24 actions. You can use a discrete action space with 24 elements representing each action.

For the second issue of the agent always selecting the same action during testing, it indicates a potential problem in the training process. Check your reward function and exploration strategy. Ensure that your agent explores different actions during training to learn the optimal policy for various states. Adjusting the exploration rate or using different exploration strategies like epsilon-greedy might help in this scenario.

Additionally, review your neural network architecture, particularly the output layer, to ensure it can represent the Q-values for all actions correctly. Debugging the training process and analyzing the agent's learning progress over episodes can provide insights into why it converges to a single action.

By addressing these aspects, you can troubleshoot the issues with your reinforcement learning agent's behavior and improve its decision-making across different states in the environment.

Bryan el 23 de Jun. de 2024

Abrir en MATLAB Online

Thank you for your observation.

Regarding the first issue, I have 5 objects that can vary between -1 and 1, so they are not discrete. Thus, I have defined my actions as follows:

ActionInfo = rlNumericSpec([5 1], 'LowerLimit', [-1; -1; -1; -1; -1], 'UpperLimit', [1; 1; 1; 1; 1]);

I understand that if we consider 24 actions, it should be defined as 5 by 24:

ActionInfo = rlNumericSpec([120 1], ...

Isn't this action dimension too large?

Another option I considered is changing the observation dimension, which originally is:

ObservationInfo = rlNumericSpec([1 99])

To:

ObservationInfo = rlNumericSpec([24 99])

The problem with this option comes from the "step" function. While I can obtain the observation in "reset" without issues, in "step" I cannot correctly define taking a different action each hour to get the next observation.

Regarding the second issue, during training, I displayed the actions taken and indeed, different actions are taken until the end of the training, where the same action is taken despite different states. As for the network architecture, it was created by the toolbox, so I wouldn't know how to respond to your comment. Therefore, I am attaching an image of the actor and critic, and the training graph.

Thank you very much in advance.

Bryan

Alan el 4 de Jul. de 2024

Editada: Alan el 4 de Jul. de 2024

Hi Bryan,

Could you describe your environment a bit more? The following is some information I would like to know:

What happens in each step of the episode? Does a step span an hour or 24h?
How have you modeled your reward function? Does it incentivize the agent well?
What agent are you using?

It would be great if you can share the environment file and the train script as well.

Regards.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

agent doesn't take different actions to different states

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

agent doesn't take different actions to different states

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo