Parallel workers automatically shutting down in the middle of RL parallel training.
Mostrar comentarios más antiguos
Hello,
I am currently training a reinforcement learning PPO agent on a Simulink model with UseParallel=true. The total episodes that it should be running for are 5000 (about 10/11 hours of training), but i'm noticing that as the training goes on, more and more workers of the parallel pool are automatically shutting down, making training slower and slower as it progresses. I start with 8 workers, and they consistently decrease one at a time, until errors are generated.
I've been noticing this consistently in each training that i do, and would like to know if there are any workarounds.
For the parpool, i am letting Matlab start it automatically with all options set to default. I have also tried playing around with the number of workers, but the same thing happens.
2 comentarios
Emmanouil Tzorakoleftherakis
el 10 de Mayo de 2023
What errors are you seeing? Maybe try training on a single worker initially to make sure you don't see any errors before moving to parallel.
Matteo D'Ambrosio
el 10 de Mayo de 2023
Editada: Matteo D'Ambrosio
el 10 de Mayo de 2023
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Reinforcement Learning en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!