Measures to improve computation time with reinforcement learning block in Simulink

11 visualizaciones (últimos 30 días)
I am using the reinforcement learning toolbox to run control tasks, in particular using the DDPG agent. Unfortunately, each episode lasts 100 seconds with a 0.01 s time step (the control time step is 0.1 s, i.e. the RL control block is called that often). The computation time is unfortunately unamangeably high.
I have tried to reduce the training of the actor and critic neural networks to every 5 episodes by using a periodic TargetUpdateMethod and changing the TargetUpdateFrequency. However, by doing a deeper analysis, it is clear that it the computational time taken by each episode, which is too high. So, this is pointing the culpript to the RL Simulink block.
The way I see it, the block should run the neural networks (which is a matrix multiplication) and store the additional experience point in the memory (so some more matrix calculations, if the memory is full). So, this is not fully explaining the large overhead to me.
My code is running (more) efficiently on Python, so it is clear I am not fully exploiting the MATLAB/C++ implementation.
Any advice on how I could try to improve the computational efficiency?

Respuestas (1)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis el 27 de En. de 2020
Editada: Emmanouil Tzorakoleftherakis el 27 de En. de 2020
Hi Enrico,
Changing the values of TargetUpdateMethod and TargetUpdateFrequency will not change how often training happens, but only how often the actor and critic copies are synced (remember DDPG is an off-policy method, so it keeps two copies of the actor and the critic).
If you look at the algorithm description here, you will see that learning happens at steps 6 and 7, and these happen at each time step (0.1s in your example), which is why you see this slowdown. So the quick things to try are 1) increase sample time, 2) reduce episode duration and 3) reduce size of mini-batch.
One additional thing to try is to parallelize training. You can use Parallel Computing Toolbox for that, and to set this up, you pretty much need to set a flag in training options (see e.g. here).
We are also working on adding more training algorithms for continuous action spaces that are more sample efficient, so I would check back when R2020a goes live.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by