Reinforcement Learning Agent not taking realistic actions

Question

Karim Darwich el 9 de Jul. de 2024 a las 7:34

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2135716-reinforcement-learning-agent-not-taking-realistic-actions

Comentada: Karim Darwich el 16 de Jul. de 2024 a las 6:04

I am using a PPO agent in a Simulink environment, but the actions produced by the agent seem to be discrete. Specifically, the agent only outputs either the upper limit or the lower limit. Any ideas why this could be happening? I am using the RL Toolbox for training.

Here are some details about my setup:

I am using a variable step time Simulink model with the ode23t solver.
My Simulink model uses the Simscape library for thermal fluids and simulates a simplified district heating network. The DHN has 2 branches NORTH (NORD) and SOUTH (SUD).
I am trying to use an RL agent to optimize control, initially focusing on minimizing energy costs by changing the mass flow in the branches.

Regarding the hyperparameters of the agent I am using the RL Toolbox with the following parameters:

Sample time= 3600
Discount factor= 0.99
GPU
Batch size = 512
Learning rate= 1e-3 (for both actor and critic)

I suspect there might be an issue with either my model or the agent. I will attach the Simulink model (the properties table should be loaded beforehand). I hope the problem is clear and that someone can help!

Thank you in advance!

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Kaustab Pal el 16 de Jul. de 2024 a las 4:19

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2135716-reinforcement-learning-agent-not-taking-realistic-actions#answer_1485868

Abrir en MATLAB Online

Hi @Karim Darwich,

From what I understand, you have a constrained action space, but after training the PPO agent, the agent is only outputting actions that are either at the upper or lower bounds of the action space. I don't think the problem arises from how the model or the agent is defined.

In the following snippet of code, I have defined obs, act, and the agent in the same format as you have. I ran it multiple iterations, and each time I received different outputs for the actions.

obs = rlNumericSpec([8 1]); 
act = rlNumericSpec([2 1],"LowerLimit",-1,"UpperLimit",1);; 
agent=rlPPOAgent(obs,act); 
a = getAction(agent, rand(obs.Dimension)); % sample observations randomly and get an action from the agent 
a = a{1}; 
disp(a); 

This indicates that there are no issues with the model or the agent's definition. I suspect that the problem might lie in the design of the reward function.

Additionally, please note that for any agents other than DDPG, TD3, and SAC, if you want to enforce lower or upper bounds on the action space or the environment, you need to do so within the environment itself.

For more information you can refer to the documentation here: https://www.mathworks.com/help/releases/R2024a/reinforcement-learning/ref/rl.util.rlnumericspec.html

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Karim Darwich el 16 de Jul. de 2024 a las 6:04

@Kaustab Pal Thank you very much I will reconstrut the reward function and try again! Have a wonderful day !

Iniciar sesión para comentar.

Reinforcement Learning Agent not taking realistic actions

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Etiquetas

Productos

Versión

Community Treasure Hunt

Reinforcement Learning Agent not taking realistic actions

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos