High Episode Reward for rlTD3 agent

Question

MD SHAHED el 7 de Jul. de 2022

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/1755555-high-episode-reward-for-rltd3-agent

Respondida: Jaynik el 27 de Dic. de 2023

The episode manager shows a high episode reward. Is it for the reward function? I am using the LQG criterion.for the reward function.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Jaynik el 27 de Dic. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/1755555-high-episode-reward-for-rltd3-agent#answer_1378467

Hi,

I understand that you are using LQG control technique for the reward function and the episode manager shows a high episode reward.

The high episode reward shown in the episode manager is indeed for the reward function. In reinforcement learning, the episode reward is a measure of how well the agent performed during a particular episode, as determined by the reward function you have defined.

In the context of LQG, the reward function might be formulated as the negative of the LQG cost, since in reinforcement learning, we typically maximize rewards, whereas in control theory, we minimize costs. The high episode reward you're observing would then correspond to a low LQG cost, suggesting that the agent's policy is effectively controlling the system with minimal error and control effort, in line with the LQG objective.

If you're using the LQG criterion for the reward function, ensure that:

The reward function correctly reflects the LQG objectives, considering both the state and control input.
The system dynamics are appropriate for LQG control, which assumes linearity and Gaussian noise.
The agent's policy and learning algorithm are suitable for the problem and are effectively using the reward signal to learn.

A high reward suggests that the agent's decisions, as evaluated by your LQG-based reward function, are effectively leading to the desired outcomes. However, it is also important to evaluate the agent's performance over multiple episodes and possibly under different initial conditions to ensure consistent and robust behavior and not just a single high-reward episode.

Hope this helps!

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

High Episode Reward for rlTD3 agent

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

High Episode Reward for rlTD3 agent

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos