Parallel reinforcement learning on HPC with warning "Received duplicate id = x from worker"

1 visualización (últimos 30 días)
When I'm running training of a reinforcement learning agent using a HPC cluster and parallel computing toolbox I get the warning "Received duplicate id = 22 from worker" (or other id) after e.g. 180 training episodes. Then the training seems to be stopped and there is no further error or warning. I am using this command to start the .m-script:
module load matlab/R2021a
matlab -nodisplay < rl_training.m
When I set
trainOpts.UseParallel = false;
often I get the warning "Error reading character from command line". Does anyone know why these messages are occurring and is there perhaps a way to continue the training?
  5 comentarios
Image Analyst
Image Analyst el 2 de Dic. de 2021
If you have a maintenance contract in place, I'd call them on the phone. Of course you can use email like @Raymond Norris said. I never use email or a support page since when I encounter a problem I need an immediate solution so I call them.
Walter Roberson
Walter Roberson el 5 de Dic. de 2021
I never call them, myself -- I open support cases, where I can describe the problem and include code and results to show clearly what is expected and what is received instead. 85% of the time the response is going to be "You are right, that's not good, the developers have been notified and it might get fixed some day".

Iniciar sesión para comentar.

Respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by