Borrar filtros
Borrar filtros

Problems about open a parpool on remote cluseter

2 visualizaciones (últimos 30 días)
Xie Ya
Xie Ya el 29 de Jul. de 2015
I have built a cluset with 5 computers with 4 core CPU. I have create a MJS and set 15 workers available in the MJS. <<
>> However, I cannot fully use the fully 15 workers.
If I use the code
parpool;
Starting parallel pool (parpool) using the '*****' profile ... connected to 1 workers.
I can get connect to one worker.
When I am using the following code:
parpool(15);
I got a mistake: Starting parallel pool (parpool) using the '******' profile ... Error using parpool (line 111) Failed to start a parallel pool. (For information in addition to the causing error, validate the profile '******' in the Cluster Profile Manager.)
Caused by: Error using parallel.internal.pool.InteractiveClient/start (line 358) Failed to initialize the interactive session. Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 726) The interactive communicating job errored with the following message: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.). Original cancel message: Job setup failed - MATLAB will now exit and restart.
Then, I did a vilidate in the cluster profile manager, what I get is :
The details are: VALIDATION DETAILS Profile: QUANT Scheduler Type: MJS
Stage: Cluster connection test (parcluster) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: Job test (createJob) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: SPMD job test (createCommunicatingJob) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: Pool job test (createCommunicatingJob) Status: Passed Description:Validation Passed Command Line Output:(none) Error Report:(none) Debug Log:(none)
Stage: Parallel pool test (parpool) Status: Failed Description:The validation stage encountered a MATLAB exception. Command Line Output:(none) Error Report: Failed to initialize the interactive session.
Caused by: Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 726) The interactive communicating job errored with the following message: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.). Original cancel message: Job setup failed - MATLAB will now exit and restart. Debug Log:(none)
So, does anyone can help me to solve this problem? I have checked and this is not caused by the lisence issue. Thanks.

Respuestas (0)

Categorías

Más información sobre MATLAB Parallel Server en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by