Is there an easy way to find out which workers are running on the same host in a Generic Cluster job so I can efficiently allgather?
9 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Frank Moore-Clingenpeel
el 30 de Ag. de 2022
Comentada: Frank Moore-Clingenpeel
el 30 de Ag. de 2022
Say I have the following script which submits a job to a Generic parallel cluster, which has procsPerNode=2:
%myScript.m
c=parcluster('MyGenericClusterProfile');
j=createCommunicatingJob(c,'Type','spmd');
j.NumWorkersRange=[4 4];
createTask(j,@mySpmdFunction,0,{});
submit(j);
What this will do, is reques 2 nodes from my cluster, each of which will individually run 2 MATLAB workers in paralel, which alltogether will run mySpmdFunction as though it was launched within an spmd statement (so they can do stuff like labSend to communicate and use labindex to get an id, etc).
My question is, is there any way for the nodes to know which other workers are 'local'--i.e., which ones reside on the same piece of hardware versus which ones are remote? A way to use reflection to find this information is preferred, but if that's not available will MATLAB consistently assign workers to nodes sequentially (so then workers 1 and 2 will always share a node and workers 3 and 4 will always share a node in the example)? If there's no way to inquire what workers share nodes, is there a way to inquire and find the GenericCluster the workers are running on so I can find the procsPerNode property?
For that matter, is there a built-in allgather function? I'm really only investigating this to implement my own allgather from scratch...
0 comentarios
Respuesta aceptada
Más respuestas (0)
Ver también
Categorías
Más información sobre Parallel Computing Fundamentals en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!