How are gpuArrays handled inside parfor?

37 visualizaciones (últimos 30 días)
Garrett Good
Garrett Good el 27 de Nov. de 2017
Editada: Joss Knight el 14 de Feb. de 2025 a las 11:28
I've been going through various posts and am still a little unsure about how gpuArrays and functions behave inside parfor loops.
FYI What I'm not doing is trying to use multiple GPUs or GPU workers as matlab workers. In my current application, each worker is running an iteration of an optimization algorithm, with the parfor code mostly executing the cost function.
An expensive part of a parfor has some large matrix multiplications and interpolations, and I know this runs much faster on the GPU. Can multiple workers access a single GPU simultaneously (up until they bottleneck the gpu memory), or does this get serialized so that there's no benefit, even if a single iteration doesn't fully use the GPU?
On that note, can a constant gpuArray (or 'object-wrapped' gpuArray?) be read simultaneously, or will each worker make its own copy on the gpu so that the worker can alter it?
Many thanks in advance for your expertise!

Respuestas (2)

Joss Knight
Joss Knight el 27 de Nov. de 2017
Editada: Joss Knight el 27 de Nov. de 2017
Yes, they can all use the same GPU. By default, anything you run on the same GPU from different processes will run in serial. However, if you are also doing a lot of host-side code, the other workers can be getting on with that while they take turns with the GPU, so you can still get a benefit. Just be wary of how much memory you are using. By default, each MATLAB process will hog up to a quarter of GPU memory. If you have four or more workers and you're using a lot of memory, you could find your GPU running out.
If you are on Linux, you can run the NVIDIA Multi Process Service to allow each process to use the GPU concurrently. However, this often doesn't gain you much, because code that is using the GPU 'well' will not have any spare compute for another process. A bit like multi-threading on a single core CPU, the apparent concurrency is still bottlenecked by the fact that there's actually only one processor.

Giorgio
Giorgio el 14 de Feb. de 2025 a las 10:04
Hello,
I am dealing with the same idea. However, I wonder if running just one of the workers on the GPU would help speed up the process more than running all the workers on the GPU. Did you tried somenthing like that? Is it possibile to do that?
Many thanks!
Giorgio
  1 comentario
Joss Knight
Joss Knight el 14 de Feb. de 2025 a las 11:28
Editada: Joss Knight el 14 de Feb. de 2025 a las 11:28
Hi Giorgio. You might want to ask a separate question because this one is very old now!
Yes, there are some situations where only having one worker using the GPU might work for you.
If you are running independent jobs (e.g. parfor or parfeval) and you write code that can run on the GPU or not, then it's just a matter of ensuring that each worker knows whether it's supposed to use the GPU or not. You might use SPMD to do this, so
parpool('Processes',6);
spmd, useGpu = spmdIndex==1; end
useGpuConst = parallel.pool.Constant(useGpu);
phrase = ["will not","will"];
fut = parfevalOnAll(@(useGpu) "This worker " + phrase(useGpu.Value+1) + " use the GPU",1,useGpuConst);
fetchOutputs(fut)
Output:
ans =
6×1 string array
"This worker will use the GPU"
"This worker will not use the GPU"
"This worker will not use the GPU"
"This worker will not use the GPU"
"This worker will not use the GPU"
"This worker will not use the GPU"
Alternatively, set the CUDA_VISIBLE_DEVICES environment variable to make the GPU only visible to one worker and then use something like canUseGPU to control the code flow.
parpool('Processes',6);
spmd
if spmdIndex ~= 3
setenv("CUDA_VISIBLE_DEVICES","''");
end
end
phrase = ["will not","will"];
fut = parfevalOnAll(@() "This worker " + phrase(canUseGPU()+1) + " use the GPU",1);
fetchOutputs(fut)
Output:
ans =
6x1 string array
"This worker will not use the GPU"
"This worker will not use the GPU"
"This worker will use the GPU"
"This worker will not use the GPU"
"This worker will not use the GPU"
"This worker will not use the GPU"
This will only work on Processes, not Threads, because thread workers share an environment (and setenv isn't supported anyway).
This workflow might mean that you do the same work on every worker, but the GPU worker finishes quicker and can therefore service more jobs than the other workers. Or you could give completely different work to the GPU worker than the others.

Iniciar sesión para comentar.

Categorías

Más información sobre GPU Computing en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by