Problems about open a parpool on remote cluseter
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I have built a cluset with 5 computers with 4 core CPU. I have create a MJS and set 15 workers available in the MJS. <<
>> However, I cannot fully use the fully 15 workers.
If I use the code
parpool;
Starting parallel pool (parpool) using the '*****' profile ... connected to 1 workers.
I can get connect to one worker.
When I am using the following code:
parpool(15);
I got a mistake: Starting parallel pool (parpool) using the '******' profile ... Error using parpool (line 111) Failed to start a parallel pool. (For information in addition to the causing error, validate the profile '******' in the Cluster Profile Manager.)
Caused by: Error using parallel.internal.pool.InteractiveClient/start (line 358) Failed to initialize the interactive session. Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 726) The interactive communicating job errored with the following message: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.). Original cancel message: Job setup failed - MATLAB will now exit and restart.
Then, I did a vilidate in the cluster profile manager, what I get is :
The details are: VALIDATION DETAILS Profile: QUANT Scheduler Type: MJS
Stage: Cluster connection test (parcluster) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: Job test (createJob) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: SPMD job test (createCommunicatingJob) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: Pool job test (createCommunicatingJob) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: Parallel pool test (parpool) Status: Failed Description:The validation stage encountered a MATLAB exception. Command Line Output:(none) Error Report: Failed to initialize the interactive session.
Caused by: Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 726) The interactive communicating job errored with the following message: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.). Original cancel message: Job setup failed - MATLAB will now exit and restart. Debug Log:(none)
So, does anyone can help me to solve this problem? I have checked and this is not caused by the lisence issue. Thanks.
0 comentarios
Respuestas (0)
Ver también
Categorías
Más información sobre MATLAB Parallel Server en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!