2 views (last 30 days)

Good evening everyone,

I'm currently trying to speed up my code by converting it to GPU computation. But I'm facing a problem I didn't have with CPU : The time doesn't scale linearly with the number of loops.

Here is the code :

Nloops = 500;

dt=1e-1; % time step

n=3e3; % number of vortices

z1=complex(gpuArray.randn(1,n),gpuArray.randn(1,n)); % vortices position

C1=repmat(1e-2*gpuArray.randn(n,1),1,n); % circulation

D=gpuArray.eye(n);

nD=~D;

tic

for ii=1:Nloops

Z1 = repmat(z1,n,1);

z1 = z1 + (dt*0.5i/pi) * (sum((C1.*nD)./(Z1-Z1.'+D),1));

end

toc

The result is quite surprising

Nloops = 100 : Elapsed time is 0.014950 seconds.

Nloops = 500 : Elapsed time is 17.072178 seconds.

While on CPU it scales well ( 3 and 15 seconds respectively ), do you have an idea why it scales so badly on GPU ?

Matt J
on 19 Dec 2017

tic and toc are not accurate measures of GPU execution time. Use gputimeit() instead.

Walter Roberson
on 19 Dec 2017

I am using a 650M card (2012 time frame) with R2017b.

I tried adding in a gather() to force the computation to finish, but it did not seem to make any difference. However, either way (with or without gather) if I run the tests close together then it takes additional time, but if I wait the timer is faster. This suggests the GPU might not have finished (even with the gather.)

Note: in the source below, I use 1e3 nodes not 3e3, so as to avoid filling my GPU memory.

function testtime

dt=1e-1; % time step

n=1e3; % number of vortices

z1 = complex(gpuArray.randn(1,n),gpuArray.randn(1,n)); % vortices position

C1=repmat(1e-2*gpuArray.randn(n,1),1,n); % circulation

D=gpuArray.eye(n);

nD=~D;

gputimeit(@()gather(LoopPart(z1)))

function z1 = LoopPart(z1)

Nloops = 500;

for ii=1:Nloops

Z1 = repmat(z1,n,1);

z1 = z1 + (dt*0.5i/pi) * (sum((C1.*nD)./(Z1-Z1.'+D),1));

end

end

end

Joss Knight
on 20 Dec 2017

Sign in to comment.

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.