Matrix multiplication bug in GPU

I am using 8.2.0.701 (R2013b) on a host with 64 AMD cores and 2 K20c GPUs. Driver version 331.62 on Ubuntu 12.04.4 LTS.
$ uname -a
Linux leibniz3 3.5.0-44-generic #67~precise1-Ubuntu SMP Wed Nov 13 16:16:57 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
The matrix multiplication on the GPU returns results that differ substantially from the CPU for matrices of size 2^13x2^13.
To replicate, simply run
clear
n = 2^13;
A = rand(n);
B = rand(n);
tic
C = A * B;
t = toc; fprintf('CPU time %f sec\n',t)
%%One GPU
gpuDevice(1); % reset device
tic;
Ag = gpuArray(A);
Bg = gpuArray(B);
C1 = gather(Ag * Bg);
t = toc; fprintf('1 GPU time %f sec\n',t)
%%Two GPUs
gpuDevice(1); % reset device
gpuDevice(2); % reset device
tic
cc = cell(2,1);
parfor i = 1:2
dev = gpuDevice;
% fprintf('Iter %d Device %d\n',i,dev.Index);
Ag = gpuArray(A);
Bg = gpuArray(B(:,(i-1)*n/2+1:i*n/2));
cc{i} = gather(Ag * Bg);
end
C2 = [cc{1} cc{2}];
t = toc; fprintf('2 GPU time %f sec\n',t)
fprintf('n = %5d %f %f\n', n, ...
max(max(abs(C - C1))), max(max(abs(C - C2))))
The error is substantial. Is this known behavior?
The code works for smaller powers of two. 2^13 is the first that causes the bug to show its ugly head. I did not check other values but I will be glad to.
With 1 GPU the difference max(max(abs(C - C1))) is 0.999716 With 2 GPUs the difference max(max(abs(C - C2))) is 134.766785
The difference is very large!
Here are the plots. The second is a zoom, cause due to size the difference was invisible because it seems it is along a boundary.
<<
>>
I will try your suggestions and follow back on this.

3 comentarios

I don't have the Parallel Computing Toolbox, but am curious as to what the errors are for
max(max(abs(C - C1)))
max(max(abs(C - C2)))
max(max(abs(C1 - C2))) % this one is not included in the fprintf
As well, does the above code work for matrices of smaller sizes? Is it only for 8192x8192 matrices that the code starts to fail?
Is this line of code
C1 = gather(Ag * Bg);
equivalent to
Cg = Ag*Bg;
C1 = gather(Cg);
?
Edric Ellis
Edric Ellis el 2 de Jul. de 2014
I can't reproduce the problem you're seeing in R2013b - but I have only a single K20c. Can you reproduce the problem using only a single GPU? Which OS are you using? Have you updated to the latest NVIDIA CUDA driver? Are you able to try R2014a (this includes a later version of the CUDA runtime libraries)?
I am also unable to reproduce this on a single K20c in R2013b. I'm running a 12 core Debian machine with GPU driver version 331.62. On my system I see reasonable agreement between the CPU and GPU results:
max(max(abs(C-C1))) = 10^(-11)
As Edric mentioned, are you able to try R2014a to see if the problem is still reproducible for you in that version?

Respuestas (0)

La pregunta está cerrada.

Preguntada:

el 1 de Jul. de 2014

Cerrada:

el 20 de Ag. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by