How are gpuArrays handled inside parfor?

Question

Garrett Good el 27 de Nov. de 2017

2
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/369501-how-are-gpuarrays-handled-inside-parfor

Editada: Joss Knight el 14 de Feb. de 2025

I've been going through various posts and am still a little unsure about how gpuArrays and functions behave inside parfor loops.

FYI What I'm not doing is trying to use multiple GPUs or GPU workers as matlab workers. In my current application, each worker is running an iteration of an optimization algorithm, with the parfor code mostly executing the cost function.

An expensive part of a parfor has some large matrix multiplications and interpolations, and I know this runs much faster on the GPU. Can multiple workers access a single GPU simultaneously (up until they bottleneck the gpu memory), or does this get serialized so that there's no benefit, even if a single iteration doesn't fully use the GPU?

On that note, can a constant gpuArray (or 'object-wrapped' gpuArray?) be read simultaneously, or will each worker make its own copy on the gpu so that the worker can alter it?

Many thanks in advance for your expertise!

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Joss Knight el 27 de Nov. de 2017

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/369501-how-are-gpuarrays-handled-inside-parfor#answer_293415

Editada: Joss Knight el 27 de Nov. de 2017

Yes, they can all use the same GPU. By default, anything you run on the same GPU from different processes will run in serial. However, if you are also doing a lot of host-side code, the other workers can be getting on with that while they take turns with the GPU, so you can still get a benefit. Just be wary of how much memory you are using. By default, each MATLAB process will hog up to a quarter of GPU memory. If you have four or more workers and you're using a lot of memory, you could find your GPU running out.

If you are on Linux, you can run the NVIDIA Multi Process Service to allow each process to use the GPU concurrently. However, this often doesn't gain you much, because code that is using the GPU 'well' will not have any spare compute for another process. A bit like multi-threading on a single core CPU, the apparent concurrency is still bottlenecked by the fact that there's actually only one processor.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Giorgio el 14 de Feb. de 2025

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/369501-how-are-gpuarrays-handled-inside-parfor#answer_1559793

Hello,

I am dealing with the same idea. However, I wonder if running just one of the workers on the GPU would help speed up the process more than running all the workers on the GPU. Did you tried somenthing like that? Is it possibile to do that?

Many thanks!

Giorgio

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Joss Knight el 14 de Feb. de 2025

Editada: Joss Knight el 14 de Feb. de 2025

Abrir en MATLAB Online

Hi Giorgio. You might want to ask a separate question because this one is very old now!

Yes, there are some situations where only having one worker using the GPU might work for you.

If you are running independent jobs (e.g. parfor or parfeval) and you write code that can run on the GPU or not, then it's just a matter of ensuring that each worker knows whether it's supposed to use the GPU or not. You might use SPMD to do this, so

parpool('Processes',6);
spmd, useGpu = spmdIndex==1; end
useGpuConst = parallel.pool.Constant(useGpu);
phrase = ["will not","will"];
fut = parfevalOnAll(@(useGpu) "This worker " + phrase(useGpu.Value+1) + " use the GPU",1,useGpuConst);
fetchOutputs(fut)

Output:

ans = 
  6×1 string array
    "This worker will use the GPU"
    "This worker will not use the GPU"
    "This worker will not use the GPU"
    "This worker will not use the GPU"
    "This worker will not use the GPU"
    "This worker will not use the GPU"

Alternatively, set the CUDA_VISIBLE_DEVICES environment variable to make the GPU only visible to one worker and then use something like canUseGPU to control the code flow.

parpool('Processes',6);
spmd
    if spmdIndex ~= 3
        setenv("CUDA_VISIBLE_DEVICES","''");
    end
end
phrase = ["will not","will"];
fut = parfevalOnAll(@() "This worker " + phrase(canUseGPU()+1) + " use the GPU",1);
fetchOutputs(fut)

Output:

ans = 
  6x1 string array
    "This worker will not use the GPU"
    "This worker will not use the GPU"
    "This worker will use the GPU"
    "This worker will not use the GPU"
    "This worker will not use the GPU"
    "This worker will not use the GPU"
    

This will only work on Processes, not Threads, because thread workers share an environment (and setenv isn't supported anyway).

This workflow might mean that you do the same work on every worker, but the GPU worker finishes quicker and can therefore service more jobs than the other workers. Or you could give completely different work to the GPU worker than the others.

Iniciar sesión para comentar.

How are gpuArrays handled inside parfor?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How are gpuArrays handled inside parfor?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (2)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos