HPC Slurm --ntasks and Matlab parcluster NumWorkers question

31 visualizaciones (últimos 30 días)
Hi,
I have a question regarding number of tasks (--ntasks ) in Slurm , to execute a .m file containing (‘UseParallel’) to run ONE genetic algorithm (‘ga’).
Maximum physical cpu is 64 per node at HPC.
In Slurm .bash file, this works:
#SBATCH --cpus-per-task=64
#SBATCH --nodes=1
#SBATCH --ntasks=1
But if I want to do
#SBATCH --cpus-per-task=128
#SBATCH --nodes=2
#SBATCH --ntasks=1
Is not allowed. "sbatch: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1"
I simply think to get 64 cpus from 1 node, so 128 cpus from 2 nodes, etc, to run ONE TASK ONLY in the following Matlab .m file.
But this tells me slurm cannot use 2 nodes to run 1 task, and I have to make it "ntasks=2" in the .bash file to still request 64+64 cpus and do some tricks in Matlab .m file and make Matlab buy them as total 128 cpus for 1 task?
In Matlab .m file, I did
num_cpu=64; % I want to increase to 128
parpool(parcluster, num_cpu)
options = optimoptions('ga','UseParallel', true, , 'UseVectorized', false,...
'PopulationSize',num_cpu-1,...)
[x,]=ga(@(x)cost_fun(x), options);
Since multiple nodes in Slurm to do one task is not allowed. I was previously suggested to define a cluster profile in Matlab instead to make HPC accept multiple nodes. https://www.mathworks.com/help/parallel-computing/discover-clusters-and-use-cluster-profiles.html
Is there a way to let NumWorkers to be 128 by using 2 nodes and 1 task in either a Matlab .m or Slurm .batch file ?

Respuesta aceptada

Raymond Norris
Raymond Norris el 17 de Feb. de 2021
In Slurm, a single task (i.e. MATLAB) can not run across multiple nodes. Let's look at a couple of options.
  • MATLAB on a single node, using 64 cores for running linear algebra routines. In this case, there's only 1 task (MATLAB), but you want to assign 64 cores so that it can spawn threads on those 64 cores.
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
This would allow for 64 computation threads, running on 64 cores.
maxNumCompThreads
ans =
64
  • MATLAB on a single node, using 64 cores to run a local pool. In this case, there's only 1 task (MATLAB), but you want to assign 64 cores so that it can spawn processes on those 64 cores.
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64
You could then run the following to run parallel algorthims (parfor, spmd, etc.)
p = parpool('local',64);
In both examples, you're requesting the same of Slurm, but makeing use of the resources slightly different. What you'd like is to start a pool of workers, across two nodes. Therefore, you must spawn a MATLAB job that then spawns a MATLAB Parallel Server job. The "outer" job only requires a single task, it's the "inner" job that will request the 128 cores. For instance
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
Then from MATLAB
num_cpu=128;
parpool(parcluster('slurm'), num_cpu); % Assumes a Slurm profile exists (see below for more info)
options = optimoptions('ga','UseParallel', true, , 'UseVectorized', false,...
'PopulationSize',num_cpu-1,...)
[x,]=ga(@(x)cost_fun(x), options);
Now, the parpool command will spawn an "inner" job, requesting Slurm for 128 cores (across 2+ nodes) to run your parallel pool.

Más respuestas (0)

Categorías

Más información sobre Cluster Configuration en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by