Fast gpuArray slicing for cart2sph

Question

Tim el 14 de Jul. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/565073-fast-gpuarray-slicing-for-cart2sph

Editada: Tim el 15 de Jul. de 2020

I have a 3 x N gpuArray, where N is very large, dimension 1 represents x, y, and z, and these points are the result of a viewpoint transformation applied via multiplication of a 3 x N matrix with a 3x3 rotation matrix and sum with a translation vector. At this point I need to convert to spherical coordinates, however to do this I have to slice the array into its x, y, and z components before passing these separate arrays as inputs to cart2sph (or sub functions like atan2 if I want to write my own version).

The problem is that the slicing of the array to feed into cart2sph takes much longer than the viewpoint transformation and the coordinate transform combined. The only way I've found to accelerate this is to replace the slicing with a dot-multiply-and-sum operation, which for some reason is faster than simply slicing. Here's some example code:

tst = randn(3, 6000000, 'single', 'gpuArray');
tic
for n = 1:100
    tst1 = tst(1, :);
    tst2 = tst(2, :);
    tst3 = tst(3, :);
    [th, phi, r] = cart2sph(tst1, tst2, tst3);
end
wait(gpuDevice);
toc
tic
for n = 1:100
    tst1b = sum(tst.*[1;0;0]);
    tst2b = sum(tst.*[0;1;0]);
    tst3b = sum(tst.*[0;0;1]);
    [th, phi, r] = cart2sph(tst1b, tst2b, tst3b);
end
wait(gpuDevice);
toc

On my computer the first loop takes ~1.5 seconds and the second ~0.5. A dot-multiply & sum is 3x faster than simply slicing the array, and the cart2sph takes only a trivial amount of time. So my questions are:

1) Is there a faster way to get from a 3xN xyz array to a 3xN phi-theta-r array, preferably that does not require a set of (very slow) slicing operations?

2) Why is a multiply and sum operation faster than a simple slicing operation?

Thank you

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Edric Ellis el 15 de Jul. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/565073-fast-gpuarray-slicing-for-cart2sph#answer_466010

Abrir en MATLAB Online

Like all arrays in MATLAB, gpuArray data is stored in "column-major" order. One consequence of this is that it is much more efficient to extract individual columns from a matrix than individual rows - this is true on the CPU, and doubly true on the GPU. Extracting a column is equivalent to a simple memory block copy. Extracting a row requires a "strided" copy operation. You can take advantage of this by performing a single up-front transpose on your array, and then use efficient column indexing operations:

tst_t = tst.';
for n = 1:100
    tst1 = tst_t(:, 1).';
    tst2 = tst_t(:, 2).';
    tst3 = tst_t(:, 3).';
    [th, phi, r] = cart2sph(tst1, tst2, tst3);
end

On my machine with a now rather old Tesla K20c, this approach takes 0.015 seconds compared with 5.3 seconds for your first approach and 2.0 seconds for your second approach. (Note that transposing vectors can be extremely effficient because the memory layout is identical)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Tim el 15 de Jul. de 2020

Editada: Tim el 15 de Jul. de 2020

Brilliant, Edric, was hoping you would weigh in. That makes sense and works on my system, although for some reason my card (RTX 2070) isn't quite as fast as yours for the accelerated version (~0.02 seconds), even though it is ~3x faster for the first two versions. The actual problem I am working on uses much larger 3 x N x M arrays which are created inside the loop, so to get this to work I have to bring the transpose inside the loop and include several reshapes, adding some overhead (predominantly the transpose, it seems). Even so, it is still many times faster than my original versions. Many thanks,

Iniciar sesión para comentar.

Fast gpuArray slicing for cart2sph

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

Fast gpuArray slicing for cart2sph

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos