Transient value problem of the variable in reward function of reinforcement learning

1 visualización (últimos 30 días)

Yihao Wan el 22 de Mzo. de 2021

1
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/779882-transient-value-problem-of-the-variable-in-reward-function-of-reinforcement-learning

Comentada: Yihao Wan el 23 de Mzo. de 2021

Respuesta aceptada: Emmanouil Tzorakoleftherakis

Hello, I encounted a problem when designing the reward function. In the simulink environment, I want to incorporate some variables in the reward function. During the training of RL agent, the varibles will converge after about 0.06s, while the agent is trained from 0s. The enable block doesn't help by putting the RL block in a subsystem.

From my understanding, it will influence the value reward function, which may result in poor trained agent. Does anyone have any suggestions regarding this questions?

Thank you very much.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Respuesta aceptada

Emmanouil Tzorakoleftherakis el 22 de Mzo. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/779882-transient-value-problem-of-the-variable-in-reward-function-of-reinforcement-learning#answer_654817

You can put the agent block under a triggered subsystem and set it to begin training after 0.06 seconds

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Emmanouil Tzorakoleftherakis el 23 de Mzo. de 2021

I believe it should be 40 yes - there is a counter implemented internally that keeps track of how many times the RL Agent block will run

Yihao Wan el 23 de Mzo. de 2021

Thank you very much for your help.

Iniciar sesión para comentar.

Más respuestas (0)

Iniciar sesión para responder a esta pregunta.

Categorías

Control Systems Reinforcement Learning Toolbox Environments

Más información sobre Environments en Help Center y File Exchange.

Productos

Simulink

Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Transient value problem of the variable in reward function of reinforcement learning

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Transient value problem of the variable in reward function of reinforcement learning

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos