MATLAB Answers

Fetching outputs from different GPU's, results in an error ?

2 views (last 30 days)
Srinidhi Ganeshan
Srinidhi Ganeshan on 27 Jan 2019
Edited: Joss Knight on 30 Jan 2019
I have 2-GPU in my computer, I wanted to use both the GPU's to perform the function. Hence I feed, part of the array to one GPU and the remaining to the second GPU.
Agpu1=gpuArray(A(:,:,1:n/2)); %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n)); %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@Function,2,Agpu1,1);
F(2)=parfeval(@Function,2,Agpu2,2);
[o1,o2] = fetchOutputs(F,'UniformOutput',false); % Blocks until complete
When I fetch the outputs using the last statement, I get the error "Error using parallel.Future/fetchOutputs : One or more futures resulted in an error" .
1) Does this mean, fetch outputs is trying to fetch the output, when the other GPU is still performing the operation. How to solve this ?
In the above link, when I try printing the gpuDevice used, it always shows gpu2 is being used and gpu 1 is idle. How to confirm both GPU's are being used ?
Thankyou!

  3 Comments

Joss Knight
Joss Knight on 28 Jan 2019
First, what is the error? Display F.Error to find out. Secondly, how many workers in your pool? If there's only one worker, then of course every call to parfeval will use the same GPU.
Thirdly GPUs are dealt out to pool workers in a round-robin fashion, but parfeval gives you no ability to select which one will be used. If you have four pool workers and two GPUs, and you invoke parfeval twice, you might get workers 1 and 3 which will have the same GPU selected.
One solution is to select the device manually in your function using gpuDevice, which will ensure a particular GPU is used. (By the way, I hope your function isn't actually called function because that's a keyword.)
Another would be to open a pool with a single worker, and use the client for the other half of the computation. This would help with data transfer since you don't need to transfer half of the array to another process.
The 'correct' solution (if there really is one) is to use SPMD, since you want both workers to be doing exactly the same thing with different data. As long as you have a pool of 2 workers you will guarantee that both are using different GPUs, and you won't even need to have a separate function. Again, no point in copying the data to the GPU before opening the SPMD block, because that will in fact slow down the data transfer rather than speeding it up.
Srinidhi Ganeshan
Srinidhi Ganeshan on 29 Jan 2019
Below is the code :
for i=1:500
A(:,:,i)=rand(500,500);
end
Agpu1=gpuArray(A(:,:,1:n/2)); %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n)); %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@fcn,2,Agpu1,1);
F(2)=parfeval(@fcn,2,Agpu2,2);
[o1,o2] = fetchOutputs(F,'UniformOutput',false); % Blocks until complete
function [q,r]=fcn(A,Id)
if nargin>1, gpuDevice(Id);end
for i=size(A,3):-1:1
[q(:,:,i),r(:,:,i)]=qr(A(:,:,i),0);
end
end
1) a) Error:
ans =
ParallelException with properties:
identifier: 'parallel:gpu:array:InvalidData'
message: 'The data no longer exists on the device.'
cause: {}
remotecause: {[1x1 MException]}
stack: [1x1 struct]
2) I am using 16 workers. In this case how will parfeval use the GPU
3) In my program, I used different GPU's using the gpuDevice Id. When I do that and execute my program, I get an error in line 5 i.e at fetch outputs. The error message is mentioned above.
4)Thanks, for mentioning that, function is not called a function in my program.
5) How to do "Another would be to open a pool with a single worker, and use the client for the other half of the computation. This would help with data transfer since you don't need to transfer half of the array to another process." ? Is there any small example you could provide ?
6)So inorder to solve (3), I tried using wait, one of the methods of parallel.FevalFuture this way
Agpu1=gpuArray(A(:,:,1:n/2)); %chunk #1 : send to GPU with device index 1
Agpu2=gpuArray(A(:,:,n/2+1:n)); %chunk #2 : send to GPU with device index 2
F(1)=parfeval(@fcn,2,Agpu1,1);
F(2)=parfeval(@fcn,2,Agpu2,2);
wait(F,'finished');
[o1,o2] = fetchOutputs(F,'UniformOutput',false); % Blocks until complete
Still I get the same error.
I also tried using fetchNext so that, each completed job arrives when it is done,,
Q1=cell(1,2);
R1=cell(1,2);
for idx=1:2
[completedIdx,Q,R] = fetchNext(F);
disp(completedIdx);
Q1{completedIdx}=Q;
R1{completedIdx}=R;
end
toc
Q=cat(3,gather(Q1{1}),gather(Q1{2}));
R=cat(3,gather(R1{1}),gather(R1{2}));
Eventhough I do this, I get the same error stating
One or more future results resulted in an error.
What should I do to solve this ?
To sum it up , I am planning to do a small part of my QR in CPU and rest of part split between the GPU devices. So that Cpu, gpu1, gpu2.. are kept busy at the same time.
Joss Knight
Joss Knight on 30 Jan 2019
You can try to use the same GPUs on more than one parallel worker, but it's pointless - the work will happen in serial. If you have two GPUs, open a pool with two workers. If you want to do some work on the GPU and some on the CPU, take a look at the answer to this question.
The error is a pretty simple one. Every time you select the device using gpuDevice, you are resetting it, clearing all gpuArray variables in memory, including the ones you passed in. As I said, there is no point in moving the data to the GPU on the client MATLAB and then sending it to your worker in a parfeval call. All that happens is that the data gets transferred back to the system memory, then transmitted to the other process, then deserialised and put back on whatever device is currently selected. Create your data on your worker or send it as a CPU array and then transfer it to the GPU at the other end. You could also try using a parallel.pool.Constant to define data on your workers that persists from call to call.
If I was trying to do pagewise QR like you are on two GPUs I'd probably use SPMD, and I probably would limit the GPU work to just the call to qr - there's no advantage to all that indexing and storage on the GPU, I don't think:
parpool('local', gpuDeviceCount);
spmd
nPages = size(A,3);
blocksize = ceil(nPages/numlabs);
strt = (labindex-1)*blocksize + 1;
fnsh = min(nPages, strt+blocksize);
for j = fnsh:-1:strt
Agpu = gpuArray(A(:,:,j));
[qgpu,rgpu] = qr(Agpu, 0);
i = j-strt+1;
q(:,:,i) = gather(qgpu);
r(:,:,i) = gather(rgpu);
end
end
% q and r are now Composites so need to be indexed to recreate result
Q = cat(3, q{:});
R = cat(3, r{:});
By the way, I hope you're not actually doing this
for i=1:500
A(:,:,i)=rand(500,500);
end
Since it's just the same as A = rand(500,500,500), but way slower.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by