Why is MATLAB gpuArray sparse matrix multiplication so fast despite using double precision?

Question

Di Xiao el 29 de Abr. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/521778-why-is-matlab-gpuarray-sparse-matrix-multiplication-so-fast-despite-using-double-precision

Comentada: Di Xiao el 24 de Jun. de 2021

I am working with multiplication of a large sparse matrix with a dense matrix using gpuArray. On my GTX 1080, MATLAB's sparse matrix multiplication runs in 5.04ms (multiplication only timed with tic/toc)

    tic
    gpu_mmm = gpu_matrix * gpu_input;
    mvm_time = toc;

. I also have a CUDA 10.2 implementation of sparse matrix multiplication using cuSPARSE, which runs the same sparse matrix multiplication in 7.25ms (timed with the Nvidia profiler). However, my CUDA implementation uses float32, while the MATLAB implementation only supports sparse matrices of type double. To my knowledge, GPUs are much faster with single precision calculations compared to double precision calculations, so I am wondering why MATLAB performs this calculation faster despite the difference in precision.

2 comentarios
Mostrar NingunoOcultar Ninguno

Thomas Barrett el 9 de Feb. de 2021

Hi @Di Xiao . Did you get to the bottom of this? I am doing the same thing (multiplying a complex sparse matrix by a regular dense complex matrix), using gpuArray, and lately I am wondering if I will see a speedup using cuSparse instead. What do you think, based on your experience with this?

Di Xiao el 24 de Jun. de 2021

Sorry for the late response! I redid the timing experiment with a 2080Ti and using cuSPARSE was faster for me by around 2x.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Andrea Picciau el 30 de Abr. de 2020

2
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/521778-why-is-matlab-gpuarray-sparse-matrix-multiplication-so-fast-despite-using-double-precision#answer_429340

Abrir en MATLAB Online

Hello there!

The correct way to time GPU operations is by using gputimeit.

mvm_time = gputimeit(@() gpu_matrix*gpu_input, 1);

or, in alternative

gpu = gpuDevice();
tic
gpu_mmm = gpu_matrix * gpu_input;
wait(gpu);
mvm_time = toc;

I suggest you try measuring your code like this...

2 comentarios
Mostrar NingunoOcultar Ninguno

Di Xiao el 30 de Abr. de 2020

Thanks. In this case I got the same timing using gputimeit(). I think this might be because I have the assignment after the matrix multiplication, so there is a wait to synchronize?

Andrea Picciau el 1 de Mayo de 2020

GPU operations are executed asynchronously, which means most of the time control is returned to the user right after the operations are launched. Wait makes sure you're measuring the whole duration of the computation, and gputimeit does something similar under the hood.

Iniciar sesión para comentar.

Answer 2

Edric Ellis el 30 de Abr. de 2020

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/521778-why-is-matlab-gpuarray-sparse-matrix-multiplication-so-fast-despite-using-double-precision#answer_429341

You should use gputime it to time operations on the GPU (although I'm not certain it will actually make a difference in this case). Behind the scenes, gpuArray is simply using the cuSPARSE routines in double-precision, so it should show basically the same performance...

2 comentarios
Mostrar NingunoOcultar Ninguno

Di Xiao el 30 de Abr. de 2020

Interesting to know the gpuArray is just using the cuSPARSE routines in double-precision. I'll look at my CUDA code to see if there's something going on there then. In this case my sparse matrix is real, while my dense matrix is complex. In my cuSPARSE routine I do one multiplication for the real and one for the complex - maybe that leads to the difference?

Joss Knight el 2 de Mayo de 2020

If you are doing two separate multiplies rather than promoting the sparse array to complex and using the cusparseCgemm routine, then that is almost certainly where the difference comes from. MATLAB is also very efficient about memory allocation so the remaining discrepancies could be to do with the way you are managing memory.

Iniciar sesión para comentar.

Why is MATLAB gpuArray sparse matrix multiplication so fast despite using double precision?

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuestas (2)

2 comentarios
Mostrar NingunoOcultar Ninguno

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

Why is MATLAB gpuArray sparse matrix multiplication so fast despite using double precision?

2 comentarios Mostrar NingunoOcultar Ninguno

Respuestas (2)

2 comentarios Mostrar NingunoOcultar Ninguno

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

2 comentarios
Mostrar NingunoOcultar Ninguno

2 comentarios
Mostrar NingunoOcultar Ninguno