R2024b parpool crashing when being activated with 24 workers.

75 visualizaciones (últimos 30 días)
Matteo D'Ambrosio
Matteo D'Ambrosio el 25 de Sept. de 2024
Comentada: Sergio E. Obando el 2 de En. de 2025 a las 23:02
!!! Update: These crashes seem to be happening quite randomly, regardless of the number of workers that are used.
Dear all,
Whenever i try to start a parpool with >20 workers on the processes profile, an error occurs and the parallel pool automatically gets shut down. I have tried validating the profile with the cluster profile manager, and using any value above 20 workers seems to be producing this error for some reason, despite my CPU having 24 cores. I've never experienced this problem on Matlab 2024a, and I have always been able to start parallel pools with up to 24 workers.
Is there a known fix for this? It has only been happening since updating to Matlab R2024b. My CPU is an Intel Core i9-14900KF.
Thanks in advance, I attached the error below if it can be useful, and a few snapshots of the cluster profile manager validations.
Command window output:
Starting parallel pool (parpool) using the 'Processes' profile ...
Error using parpool (line 133)
Parallel pool failed to start with the following error. For more detailed information,
validate the profile 'Processes' in the Cluster Profile Manager.
Error in parallel.internal.ui.PoolHelper.startPool (line 12)
parpool();
^^^^^^^^^
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>@()checker.checkState()
(line 121)
The parallel pool job errored with the following message: MATLAB worker shut down
unexpectedly with status 1 during task execution.
Parallel pool using the 'Processes' profile is shutting down.
This parallel pool has been shut down.
Caused by:
The client lost connection to worker 2 (Task 2; Host: localhost), potentially due to
network issues or errors during the interactive communicating job.
With 16 workers (same output when using 20):
With 24 workers:
  3 comentarios
Matteo D'Ambrosio
Matteo D'Ambrosio el 25 de Sept. de 2024
Editada: Matteo D'Ambrosio el 25 de Sept. de 2024
Thanks for the reply!
Yes the error messages are the same, the only difference is the number (ID) of the worker that fails.
Chao Wang
Chao Wang el 26 de Dic. de 2024 a las 7:41
I faced the same problem and I have no idea why it happened.

Iniciar sesión para comentar.

Respuestas (1)

Sergio E. Obando
Sergio E. Obando el 25 de Sept. de 2024
While not exactly the same error, this post covers some good troubleshooting steps: Validation Fails
If you prefer or if those steps do not resolve your issue, I would highly recommend contacting Technical Support.
  8 comentarios
Raffael Kozerski
Raffael Kozerski el 2 de En. de 2025 a las 20:13
Same here:
running a simulation with more than 60 workers crashed with R2024b on several machines.
The same simulation runs fine with R2024a using 700 cores/Matlab workers.
No idea why R2024b crashed; also running SPMD validation test.
in the Job log there is only a "Matlab crashed on worker XXX" message - no other useful information.
Raffael-
Sergio E. Obando
Sergio E. Obando el 2 de En. de 2025 a las 23:02
Raffael, please reach out to technical support. They can help you debug this issue and see if the root cause is similar to the one from the original post.

Iniciar sesión para comentar.

Productos


Versión

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by