gpu memory code optimization
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Dear Wizes,
I would appreciate if you could break this: My code includes gpuArray operations inside a for loop; the relevant portion is here:
- % allocate gpu memory:
- A=GPUArray.eye(x,'single'); B=GPUArray.zeros(y,x,'single'); C=GPUArray.zeros(x,y,'single'); % x>>>y
- for n=1:t %for loop begins
- ... % not relevant, B and C are 'filled' by specific matrix multiplications
- D=B*A; % size(D)= (y,x)
- E=C*D; % size(E)= (x,x)
- A=A-E;
- clear E D
- ...
- end
I must mention that all of A,B,C,D,E are different with each iteration in the for loop as they are reused.
The problem is that x is large, and A and E are huge (2 to 7Gb, depending on x), killing my gpu. I made it run, albeit slowly, by breaking E (performing operations row-wise in A for steps 6-7 above:
for i=1:size (A,1)
E=C(i,:)*D;
A(i,:)=A(i,:)-E;
clear E D
1. This works, but is very slow, I was wondering if there is a way to calculate the same for blocks of n rows at once, not one row at a time (with n scaled based on what the gpu can take, where x=kn+p, where p<n); or using mtimesx-like bsxfun routines for matrix multiplication.
2. It would be great if A could be broken in blocks of rows or columns, or in one at a time (row-wise or column-wise), however this is above my job description, given that A is the right multiplier in step 5. This would allow me to expand the size of x I can use.
Thank you, as always Octavio
6 comentarios
Matt J
el 15 de Dic. de 2014
Are none of these matrices sparse? I know that the GPU doesn't support sparse matrices, but if they are sparse, maybe the CPU is better?
Respuestas (0)
Ver también
Categorías
Más información sobre Kernel Creation from MATLAB Code en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!