Why complex arrays take twice as much memory on GPU than on CPU ?

1 visualización (últimos 30 días)
Anterrieu
Anterrieu el 22 de Oct. de 2015
Comentada: Anterrieu el 23 de Oct. de 2015
>> ver
---------------------------------------------------------------------------------------------
MATLAB Version: 8.3.0.532 (R2014a)
MATLAB License Number: ••••••
Operating System: Linux 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64
Java Version: Java is not enabled
---------------------------------------------------------------------------------------------
MATLAB Version 8.3 (R2014a)
Simulink Version 8.3 (R2014a)
Control System Toolbox Version 9.7 (R2014a)
Curve Fitting Toolbox Version 3.4.1 (R2014a)
Image Processing Toolbox Version 9.0 (R2014a)
MATLAB Compiler Version 5.1 (R2014a)
Mapping Toolbox Version 4.0.1 (R2014a)
Optimization Toolbox Version 7.0 (R2014a)
Parallel Computing Toolbox Version 6.4 (R2014a)
Signal Processing Toolbox Version 6.21 (R2014a)
Statistics Toolbox Version 9.0 (R2014a)
System Identification Toolbox Version 9.0 (R2014a)
>>
>> gpu=gpuDevice(1)
gpu =
CUDADevice with properties:
Name: 'Tesla K40m'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 6
ToolkitVersion: 5.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.2079e+10
FreeMemory: 1.1914e+10
MultiprocessorCount: 15
ClockRateKHz: 875500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
>> A=rand(1000,1000); whos('A')
Name Size Bytes Class Attributes
A 1000x1000 8000000 double
>> m=gpu.FreeMemory; B=gpuArray(A); fprintf('%d bytes\n',m-gpu.FreeMemory);
8126464 bytes
>> clear A B
>>
>> A=complex(rand(1000,1000),rand(1000,1000)); whos('A')
Name Size Bytes Class Attributes
A 1000x1000 16000000 double complex
>> m=gpu.FreeMemory; B=gpuArray(A); fprintf('%d bytes\n',m-gpu.FreeMemory);
32374784 bytes
>>

Respuestas (3)

Edric Ellis
Edric Ellis el 23 de Oct. de 2015
Based on the prior comments, I think I understand the problem now. Complex arrays on the GPU take up the same amount of memory as on the GPU, but (especially in R2014a), it can be difficult to see that for various reasons. On my machine using R2014a, the following steps:
d = gpuDevice(1);
f1 = d.FreeMemory;
gcx = repmat(gpuArray(1i), 1000, 1000);
f2 = d.FreeMemory;
bytesPerElement = (f1 - f2) / (1000*1000)
demonstrate that a complex gpuArray uses 16 bytes per element, just like on the CPU.
Now, there are two subtleties here that I think are getting in the way of what you're actually trying to achieve:
  1. MATLAB has a memory re-use scheme on the GPU that causes the values returned in FreeMemory to be misleading. FreeMemory shows the number of bytes actually free on the GPU, but it doesn't reflect how much memory is actually available to MATLAB to create new gpuArrays. This is why later releases have a property AvailableMemory, which reflects how many bytes are available to make new gpuArrays.
  2. When transferring an array from the CPU to the GPU, there's a format conversion required. MATLAB on the CPU stores complex data as two separate allocations - the real part, and the imaginary part. On the GPU, the data is stored in a single interleaved allocation. The transformation from split-complex to interleaved-complex is performed on the GPU, and this requires extra space. Therefore, the maximum complex array that can be transferred from the CPU to the GPU is roughly half the total GPU memory size.
You can avoid the problem in (2) if it is possible to construct the array directly on the GPU (as I did in my example) - however I appreciate that's not always possible.
  1 comentario
Anterrieu
Anterrieu el 23 de Oct. de 2015
Thank you very much for this detailed answer. You understood the problem very well. Actually I cannot verify your point 1. because with R2014a there is no Available.Memory field in the gpuDevice struct. Concerning your point 2. this is exactly how I discovered the problem: in the real situation I nedd to transfer 8 GB of a complex array from CPU to GPU and I cannot despite the 12 GB of FreeMemory on my GPU!

Iniciar sesión para comentar.


Matt J
Matt J el 22 de Oct. de 2015
Editada: Matt J el 22 de Oct. de 2015
"FreeMemory" appears to be an undocumented method or property. When I use "AvailableMemory" instead, I get the correct result.
>> A=complex(rand(1000,1000),rand(1000,1000));
>> clear B; m=gpu.AvailableMemory; B=gpuArray(A);
>> fprintf('%d bytes\n',m-gpu.AvailableMemory);
16374784 bytes
  6 comentarios
Matt J
Matt J el 22 de Oct. de 2015
Editada: Matt J el 22 de Oct. de 2015
I have tested on both the GTX 580 and the Titan X. Here's my version info,
Parallel Computing Toolbox Version 6.6 (R2015a)
I suppose this could account for the difference in the output of gpuDevice, though strangely a google search on "FreeMemory" doesn't show up for me anywhere (leading me to have thought that it was undocumented).
Have you independently verified that the GPU is consuming 32 MB? Perhaps it is just being reported incorrectly by gpu.FreeMemory. Edric has said that it is the wrong thing to use.
Anterrieu
Anterrieu el 22 de Oct. de 2015
I will ask my admin to install 2015b to check if the trouble still persists. The GPU is really using this memory: with an array consumming 8 GB on the CPU when sending it to my GPU equipped with 12 GB, I have a message saying that GPU does not have enough memory. This is actually how I discovered this trouble.

Iniciar sesión para comentar.


Lessmann
Lessmann el 22 de Oct. de 2015
Hi,
this behaviour is not a difference between CPU and GPU. It is the general case that the complex nuber uses twice the memory.
Name Size Bytes Class Attributes
A 5x5 200 double
B 5x5 400 double complex
Matlab use two double to save the real and the imaginary part, so twice the memory need.
  1 comentario
Anterrieu
Anterrieu el 22 de Oct. de 2015
Thnak you but I know that a complex is twice a double. The uestion was, and still is, why a complex is twice larger on GPU than on CPU ? Taking your example, on GPU A is 200 bytes, like on CPU, but B is 800 bytes, twice the amount on CPU. WHY ?

Iniciar sesión para comentar.

Categorías

Más información sobre Multicore Processor Targets en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by