Reset GPU & Clear its Memory

42 visualizaciones (últimos 30 días)
Dan Johnson
Dan Johnson el 19 de En. de 2017
Comentada: Vitaly Bur el 29 de Oct. de 2020
I'm running simulations and computations in MATLAB using some reasonably big data sets, and the bulk of the work is done on the GPU. I can only get through about a third of the work I need to do before I receive an error saying the GPU memory is full:
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_OUT_OF_MEMORY
I've had this problem for a while, and have tried to get around it by resetting the GPU between each simulation, using any and all of the following:
gpuDevice;
gpuDevice(1);
reset(gpuDevice(1));
wait(gpuDevice(1));
None of these work, neither on their own or combined, nor do they work if I attempt them after my simulations have crashed out. There seems to be no effective way to reset/flush the GPU other than a reboot of my computer.
I'm getting work done this way, but it's slow, and annoying, and means I can't just leave my code running over the weekend as I'd like to - only half of it gets done. I'm sure there must be a way to reset the GPU in MATLAB, and if one of the methods I've tried is correct, what am I doing wrong?
Any ideas?
EDIT: Problem occurs on both R2016a and the R2017a Prerelease.
  4 comentarios
Dan Johnson
Dan Johnson el 23 de En. de 2017
Editada: Dan Johnson el 23 de En. de 2017
Thanks for the comments. I'm running a GeForce GTX 960.
I'd love to provide you with an example, but short of copying out my entire codebase I'm not sure what I could post that would be helpful. Here's the code I execute for each data run (I've renamed the functions for clarity):
for m = 1:8
inputVars = CreateVars();
SimulateData(inputVars);
for n = 1:50
[outputVars] = RunReconstruction(inputVars);
save([savePath(m,n)],'outputVars');
end
close all; clear;
end
NOTE: 1. RunReconstruction() gathers the "outputVars" before passing them back. 2. I typically get to m=4 before I get the CUDA error.
Joss Knight
Joss Knight el 20 de Jul. de 2017
I think you're going to have to try to create a minimal reproduction that is a condensed version of your code, otherwise it's impossible to diagnose. Also see below for advise about monitoring your memory usage.

Iniciar sesión para comentar.

Respuestas (2)

Joss Knight
Joss Knight el 23 de En. de 2017
Presumably your simulations are adding results continually to some output variables, which are getting larger and larger. Try gathering your results back to the CPU so that you're not clogging up GPU memory with data that isn't being used for computation any more.
  3 comentarios
Joss Knight
Joss Knight el 20 de Jul. de 2017
No, MATLAB releases variables as soon as they are no longer referenced. But it's common for users to run scripts rather than functions, and to aggregate results into a big output array that sits in their MATLAB workspace, e.g.
results(end+1,:) = myNewResults;
Why don't you run your simulation and monitor GPU memory in a separate terminal or command window using nvidia-smi, something like:
nvidia-smi -l 1 -q -d MEMORY
If memory usage is continually going up then you've got some sort of problem with your simulation not releasing variables.
Vitaly Bur
Vitaly Bur el 29 de Oct. de 2020
I have a same problem with clear GPU memory: After executing this code, the GPU memory is use by 2 GB. Only the D matrix in GPU memory...
A=fix(gpuArray(rand(1,1000))*99)+1;
B=fix(gpuArray(rand(1,1000))*99)+1;
C=gpuArray(rand(100000,100));
E=C(:,A);
F=C(:,B);
D=E.*F;
clear E F C A B
However, if I execute this code.
D=gpuArray(rand(100000,1000));
There will also be a D matrix (same size) in GPU memory, but now it only use 1 GB of GPU memory. Why is there a difference? and how to clear the memory in the first variant?

Iniciar sesión para comentar.


Remi D
Remi D el 19 de Jul. de 2017
I also think there is a problem. I as soon as I call a cuda mex file, running reset(gpuDevice) would throw an error.
Error using parallel.gpu.CUDADevice/reset
An unexpected error occurred during CUDA execution. The CUDA error was:
all CUDA-capable devices are busy or unavailable
If I don't try to call reset, I can call again the mex function and it works fine. But as soon as I use reset, the only way to use the GPU is to restart Matlab.
I guess I have to go back to C and leave Matlab in the drawer when I need parallel computing :(
  1 comentario
Joss Knight
Joss Knight el 20 de Jul. de 2017
Editada: Joss Knight el 20 de Jul. de 2017
If you are using custom MEX functions then we'd have to know more about what they're doing to diagnose. Are you storing state, GPU memory, cufft plans? Are you spinning off threads that are using the GPU? You may need to register a listener to the GPUDeviceManager's DeviceDeselecting event (see the documentation here) in order to respond to a call to reset by tidying up your state or waiting for threads to finish.
Another very common scenario is that your custom MEX function is erroring, perhaps seriously, and you are not checking or clearing up that error. If the next thing you do on the GPU is to call reset, than that will be the first place to detect and report the error. So ensure your mex function ends with something like
cudaDeviceSynchronize();
auto err = cudaGetLastError();
if (err != cudaSuccess) {
mexPrintf("CUDA error: %s\n", cudaGetErrorString(err));
}

Iniciar sesión para comentar.

Categorías

Más información sobre GPU Computing en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by