- Scale between “Q0” and episode reward: It is possible that there is a significant difference in scale between the “Q0” estimate and the actual episode reward. This disparity may lead to unexpected results and impact the training process. To investigate this, you can try unchecking the "Show Episode Q0" option to see if it affects the episode reward values.
- Another possibility is that there might be an issue with the implementation of the DDPG algorithm itself. The algorithm should be able to handle both positive and negative rewards. It is important to ensure that you are using the return, which is the sum of the rewards for a specific state-action pair from that point until the end of the trajectory.
- Simplify the critic network: It might be helpful to simplify the critic network to ensure that it outputs values on a similar scale as the episode reward. This can help align the “Q0” estimates with the actual rewards, providing more accurate feedback for the agent's learning process.
RL DDPG, reward should be negative however episode Q0 reward is becoming positive
8 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hello Everyone,
I am building LQR type controller. My reward is the negative of LQR quadratic cost given as x'Qx + u'Ru. When i train the DDPG agen the episode Q0 reward is becoming positive. Since according to my understanding Episode Q0 is the estimate of the discounted long-term reward at the start of each episode, given the initial observation of the environment. The how is it possible? why is episode q0 reward going positive bcz the reward function is designed to be negative!
0 comentarios
Respuestas (2)
UDAYA PEDDIRAJU
el 26 de Oct. de 2023
Hello Muhammad,
I understand that you are facing an issue where the episode “Q0” reward is becoming positive though it was designed to achieve negative reward, to address this issue, I suggest considering the following solutions:
Further you can have a refer to the MathWorks Documentation:
I hope this helps!
0 comentarios
Ver también
Categorías
Más información sobre Environments en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!