- MATLAB has a memory re-use scheme on the GPU that causes the values returned in FreeMemory to be misleading. FreeMemory shows the number of bytes actually free on the GPU, but it doesn't reflect how much memory is actually available to MATLAB to create new gpuArrays. This is why later releases have a property AvailableMemory, which reflects how many bytes are available to make new gpuArrays.
- When transferring an array from the CPU to the GPU, there's a format conversion required. MATLAB on the CPU stores complex data as two separate allocations - the real part, and the imaginary part. On the GPU, the data is stored in a single interleaved allocation. The transformation from split-complex to interleaved-complex is performed on the GPU, and this requires extra space. Therefore, the maximum complex array that can be transferred from the CPU to the GPU is roughly half the total GPU memory size.
Why complex arrays take twice as much memory on GPU than on CPU ?
    7 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
>> ver
---------------------------------------------------------------------------------------------
MATLAB Version: 8.3.0.532 (R2014a)
MATLAB License Number: ••••••
Operating System: Linux 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64
Java Version: Java is not enabled
---------------------------------------------------------------------------------------------
MATLAB                                                Version 8.3        (R2014a)
Simulink                                              Version 8.3        (R2014a)
Control System Toolbox                                Version 9.7        (R2014a)
Curve Fitting Toolbox                                 Version 3.4.1      (R2014a)
Image Processing Toolbox                              Version 9.0        (R2014a)
MATLAB Compiler                                       Version 5.1        (R2014a)
Mapping Toolbox                                       Version 4.0.1      (R2014a)
Optimization Toolbox                                  Version 7.0        (R2014a)
Parallel Computing Toolbox                            Version 6.4        (R2014a)
Signal Processing Toolbox                             Version 6.21       (R2014a)
Statistics Toolbox                                    Version 9.0        (R2014a)
System Identification Toolbox                         Version 9.0        (R2014a)
>>
>> gpu=gpuDevice(1)
gpu =
    CUDADevice with properties:
                        Name: 'Tesla K40m'
                       Index: 1
           ComputeCapability: '3.5'
              SupportsDouble: 1
               DriverVersion: 6
              ToolkitVersion: 5.5000
          MaxThreadsPerBlock: 1024
            MaxShmemPerBlock: 49152
          MaxThreadBlockSize: [1024 1024 64]
                 MaxGridSize: [2.1475e+09 65535 65535]
                   SIMDWidth: 32
                 TotalMemory: 1.2079e+10
                  FreeMemory: 1.1914e+10
         MultiprocessorCount: 15
                ClockRateKHz: 875500
                 ComputeMode: 'Default'
        GPUOverlapsTransfers: 1
      KernelExecutionTimeout: 0
            CanMapHostMemory: 1
             DeviceSupported: 1
              DeviceSelected: 1
>> A=rand(1000,1000); whos('A')
  Name         Size                Bytes  Class     Attributes
    A         1000x1000            8000000  double              
>> m=gpu.FreeMemory; B=gpuArray(A); fprintf('%d bytes\n',m-gpu.FreeMemory);
8126464 bytes
>> clear A B
>>
>> A=complex(rand(1000,1000),rand(1000,1000)); whos('A')
  Name         Size                 Bytes  Class     Attributes
    A         1000x1000            16000000  double    complex   
>> m=gpu.FreeMemory; B=gpuArray(A); fprintf('%d bytes\n',m-gpu.FreeMemory);
32374784 bytes
>>
0 comentarios
Respuestas (3)
  Edric Ellis
    
      
 el 23 de Oct. de 2015
        Based on the prior comments, I think I understand the problem now. Complex arrays on the GPU take up the same amount of memory as on the GPU, but (especially in R2014a), it can be difficult to see that for various reasons. On my machine using R2014a, the following steps:
d = gpuDevice(1);
f1 = d.FreeMemory;
gcx = repmat(gpuArray(1i), 1000, 1000);
f2 = d.FreeMemory;
bytesPerElement = (f1 - f2) / (1000*1000)
demonstrate that a complex gpuArray uses 16 bytes per element, just like on the CPU.
Now, there are two subtleties here that I think are getting in the way of what you're actually trying to achieve:
You can avoid the problem in (2) if it is possible to construct the array directly on the GPU (as I did in my example) - however I appreciate that's not always possible.
  Matt J
      
      
 el 22 de Oct. de 2015
        
      Editada: Matt J
      
      
 el 22 de Oct. de 2015
  
      "FreeMemory" appears to be an undocumented method or property. When I use "AvailableMemory" instead, I get the correct result.
>> A=complex(rand(1000,1000),rand(1000,1000));
>> clear B; m=gpu.AvailableMemory; B=gpuArray(A); 
>> fprintf('%d bytes\n',m-gpu.AvailableMemory);
16374784 bytes
6 comentarios
  Matt J
      
      
 el 22 de Oct. de 2015
				
      Editada: Matt J
      
      
 el 22 de Oct. de 2015
  
			I have tested on both the GTX 580 and the Titan X. Here's my version info,
Parallel Computing Toolbox       Version 6.6        (R2015a)
I suppose this could account for the difference in the output of gpuDevice, though strangely a google search on "FreeMemory" doesn't show up for me anywhere (leading me to have thought that it was undocumented).
Have you independently verified that the GPU is consuming 32 MB? Perhaps it is just being reported incorrectly by gpu.FreeMemory. Edric has said that it is the wrong thing to use.
  Lessmann
      
 el 22 de Oct. de 2015
        Hi,
this behaviour is not a difference between CPU and GPU. It is the general case that the complex nuber uses twice the memory.
Name      Size            Bytes  Class     Attributes
A         5x5               200  double              
B         5x5               400  double    complex
Matlab use two double to save the real and the imaginary part, so twice the memory need.
Ver también
Categorías
				Más información sobre Multicore Processor Targets en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



