Borrar filtros
Borrar filtros

Converting parallel CPU processing into GPU processing

12 visualizaciones (últimos 30 días)
Douglas Miller
Douglas Miller el 11 de Mzo. de 2022
Comentada: Walter Roberson el 12 de Mzo. de 2022
I am trying to convert code that ran in parallel on CPU cores into parallel processing on the gpu.
I would like to process matrices in a cell array on the GPU in parallel for how many cores are present on the gpu. However, it performs significantly slower than on a parallel CPU processor of 4 cores (25 cells processed in 30 minutes on 4 CPU cores, 5 cells is currently taking over 45 minutes to process on GPU and is still not finished). I'm very new to GPU computing and nothing seemed really obvious on how to speed this up.
GPU properties:
Data to be processed:
  • series is a 568x1 cell array
  • each cell is a 60x60 double (each entry is a value between -1 and 1)
Start processing
tic % test
for i = 1:5
cell_array{i} = gpuArray(cleanSeries{i});
end
Determine size of matrix within the first cell, equivalent to number of biological cells recorded
numCells = gpuArray(length(cell_array{1}));
Preallocate arrays for data
clust_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
Initiate the processing
parfor cellNumber = 1:length(cell_array)
threshold_clust = gpuArray(NaN(numCells,100));
random_clust = gpuArray(NaN(numCells,100));
% process data over varying proportional thresholds starting at 25%
% strongest to fully connected (%100) at 25% steps i.e. 25%, 50%, 75%,
% 100%
for threshold = 25:25:100
threshold_matrix = (threshold_proportional(cell_array{cellNumber}, threshold/100)); % proportional threshold matrix - custom function
% clustering requires that all values be between 0 and 1 so remove
% any negatives
threshold_matrix(threshold_matrix < 0) = 0;
% ensure that randomizing the matrix is possible
[rowi,coli] = find(tril(threshold_matrix));
bothi = [rowi coli];
c = bothi(1,1);
d = bothi(1,2);
e=find(c==bothi);
f=find(d==bothi);
if length(e)==length(bothi)||length(f)==length(bothi)
disp(['One cell has all the connections, skipping ', int2str(threshold), '% threshold.'])
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
elseif length(bothi) <=3
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
else
% create random matrix - custom function
random_matrix = latmio_und(threshold_matrix,1000);
% clustering coefficient per matrix - custom function
threshold_clust(:,threshold) = clustering_coef_wu(threshold_matrix);
random_clust(:,threshold) = clustering_coef_wu(random_matrix);
end % if logic end
end % for loop end
% concatenate over thresholds
clust_mean(:,cellNumber) = mean(threshold_clust,2,'omitnan');
clust_std(:,cellNumber) = std(threshold_clust,0,2,'omitnan');
clust_random_mean(:,cellNumber) = mean(random_clust,2,'omitnan');
clust_random_std(:,cellNumber) = std(random_clust,0,2,'omitnan');
end % parfor loop end
gather(clust_mean);
gather(clust_std)
gather(clust_random_std);
gather(clust_random_mean);
toc
  6 comentarios
Douglas Miller
Douglas Miller el 12 de Mzo. de 2022
According to the other post, it sounds like running in parallel isn't feasible on the GPU the way I was hoping. But I had never considered that zeros would process quicker. That will definitely help optimize the code. Thank you so much!
Walter Roberson
Walter Roberson el 12 de Mzo. de 2022
For operations other than pure copying, NaN has to go through a special "Abort" path in all calculations; calculations with it cannot stream the normal way. There also has to be special checking to see if the NaN is a "signalling NaN" as signalling NaN are required to raise exceptions whenever they occur.
inf cannot readily stream either... but I guess a bit more readily than NaN.

Iniciar sesión para comentar.

Respuestas (1)

Matt J
Matt J el 12 de Mzo. de 2022
Editada: Matt J el 12 de Mzo. de 2022
I would like to process matrices in a cell array on the GPU in parallel for how many cores are present on the gpu.
No, GPU cores cannot act like parpool workers. They are a completely different animal.

Categorías

Más información sobre GPU Computing en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by