Measures to improve computation time with reinforcement learning block in Simulink

Question

Enrico Anderlini el 13 de Dic. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/496460-measures-to-improve-computation-time-with-reinforcement-learning-block-in-simulink

Editada: Emmanouil Tzorakoleftherakis el 27 de En. de 2020

I am using the reinforcement learning toolbox to run control tasks, in particular using the DDPG agent. Unfortunately, each episode lasts 100 seconds with a 0.01 s time step (the control time step is 0.1 s, i.e. the RL control block is called that often). The computation time is unfortunately unamangeably high.

I have tried to reduce the training of the actor and critic neural networks to every 5 episodes by using a periodic TargetUpdateMethod and changing the TargetUpdateFrequency. However, by doing a deeper analysis, it is clear that it the computational time taken by each episode, which is too high. So, this is pointing the culpript to the RL Simulink block.

The way I see it, the block should run the neural networks (which is a matrix multiplication) and store the additional experience point in the memory (so some more matrix calculations, if the memory is full). So, this is not fully explaining the large overhead to me.

My code is running (more) efficiently on Python, so it is clear I am not fully exploiting the MATLAB/C++ implementation.

Any advice on how I could try to improve the computational efficiency?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Emmanouil Tzorakoleftherakis el 27 de En. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/496460-measures-to-improve-computation-time-with-reinforcement-learning-block-in-simulink#answer_412231

Editada: Emmanouil Tzorakoleftherakis el 27 de En. de 2020

Hi Enrico,

Changing the values of TargetUpdateMethod and TargetUpdateFrequency will not change how often training happens, but only how often the actor and critic copies are synced (remember DDPG is an off-policy method, so it keeps two copies of the actor and the critic).

If you look at the algorithm description here, you will see that learning happens at steps 6 and 7, and these happen at each time step (0.1s in your example), which is why you see this slowdown. So the quick things to try are 1) increase sample time, 2) reduce episode duration and 3) reduce size of mini-batch.

One additional thing to try is to parallelize training. You can use Parallel Computing Toolbox for that, and to set this up, you pretty much need to set a flag in training options (see e.g. here).

We are also working on adding more training algorithms for continuous action spaces that are more sample efficient, so I would check back when R2020a goes live.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Measures to improve computation time with reinforcement learning block in Simulink

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Measures to improve computation time with reinforcement learning block in Simulink

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos