GPU arrayfun with shared arrays

1 visualización (últimos 30 días)
Ray
Ray el 11 de Nov. de 2014
Editada: Matt J el 14 de Nov. de 2014
Hi all,
I'm trying to speed-up some code I'm running by using the GPU functionality that comes with arrayfun.
I know arrayfun operates in an element-wise fashion however, I have a situation where I have some shared arrays involved in my function. For example, I have a function like:
f = f(a,b,A,B,C) Where a and b are (n x 1) arrays ie. the element-wise portion of the function. A, B, C are arrays that remain constant during each element-wise execution of a and b.
I've tried searching how to implement this but the results don't look too promising. Is it possible to do this using arrayfun? If not, is there another way I can speed-up such a function? I've tried utilising "par-for" but this actually turned out to be slower than a normal for-loop.
Thanks,
Ray

Respuestas (3)

Matt J
Matt J el 11 de Nov. de 2014
Editada: Matt J el 11 de Nov. de 2014
The only hope, I think, would be to write your own CUDA kernel implemention of f(), putting A,B,C in constant memory if they are small enough to fit there. You could manage this through MATLAB using a CUDAKernel object, see
and its setConstantMemory method.

Mikhail
Mikhail el 11 de Nov. de 2014
You can try to use your function without arrayfun. If at least 1 of the arguments is on GPU, calculations will be performed on GPU.

Edric Ellis
Edric Ellis el 12 de Nov. de 2014
Can you give a more concrete example of what you'd like to do with A, B, and C? You might be able to use a nested function with up-level variables. This example is quite complex, but it shows some of the more advanced things you can do with nested functions and arrayfun. In particular, the nested function updateParentGrid accesses the up-level variable grid and indexes into it to perform the stencil computation.
  1 comentario
Matt J
Matt J el 14 de Nov. de 2014
Editada: Matt J el 14 de Nov. de 2014
But can it be efficient to do this? I assume that there are CUDA threads doing each element-wise computation under the hood. If all threads need the variables A,B, and C, then surely those variables would need to be stored in constant memory in order for all threads to access them quickly enough.

Iniciar sesión para comentar.

Categorías

Más información sobre GPU Computing en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by