CUDA_ERROR​_ILLEGAL_A​DDRESS with "trainNetwork" on NVidia RTX8000

2 visualizaciones (últimos 30 días)
Jakob Nikolas Kather
Jakob Nikolas Kather el 12 de Feb. de 2021
Respondida: Aditya Patil el 17 de Feb. de 2021
I am using Matlab R2020b to train a deep neural network on an image classification task with trainNetwork(). My hardware is an NVidia Quadro RTX8000 and I have CUDA 11.2 installed on Windows Server 2019. My input images are 512x512 px, in total I have ~1M training images. If I train with a mini batch size of 512, the training works well and 22 GB of GPU RAM are used. However, my GPU has 48 GB of RAM and I want to fully use it, so I increase my mini batch size to 1024.
When I do this, I get the error
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> In nnet.internal.cnn/DAGNetwork/forwardPropagationWithMemory (line 347)
In nnet.internal.cnn/DAGNetwork/computeGradientsForTraining (line 716)
In nnet.internal.cnn/Trainer/computeGradients (line 200)
In nnet.internal.cnn/Trainer/train (line 119)
In nnet.internal.cnn.trainNetwork.doTrainNetwork (line 91)
In trainNetwork (line 181)
In trainMyNetwork (line 36)
this is the output of gpuDevice
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'Quadro RTX 8000'
Index: 1
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 11.2000
ToolkitVersion: 10.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.1540e+10
AvailableMemory: 4.3099e+10
MultiprocessorCount: 72
ClockRateKHz: 1770000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
It seems to me that Matlab has an issue to access more than 24 GB of GPU RAM, but how do I fix this?
Thank you!

Respuestas (1)

Aditya Patil
Aditya Patil el 17 de Feb. de 2021
As this error occurs only for large batch size, one possibility is that the driver is outdated and not detecting out of memory errors properly. You can update the driver to latest version and try again. If that doesn't work, you can comment here or report a bug to MathWorks via the contact us page.

Categorías

Más información sobre Image Data Workflows en Help Center y File Exchange.

Productos


Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by