Handling memory when working with very huge data (.mat) files.

4 visualizaciones (últimos 30 días)
I am working with two 5D arrays (A5D and B5D) saved in a big_mat_file.mat file. The size of these arrays is specified in the code below. The total size of big_mat_file.mat file is around 20GB. I want to perform three simple operations on these matrices, as shown in the code. I have access to my university's computing cluster. When I run the following code with 120 workers and 400GB of memory, I receive the following error
In distcomp/remoteparfor/handleIntervalErrorResult (line 245) In distcomp/remoteparfor/getCompleteIntervals (line 395) In parallel_function>distributed_execution (line 746) In parallel_function (line 578)
Can someone please help me understanding what is causing this error. Is it because of low memory? It there anyother way to do the following operattions?
clear; clc;
load("big_mat_file.mat");
% it has two very huge 5D arrays "A5D" and "B5D", and two small arrays "as" and "bs"
% size of both A5D and B5D is [41 16 8 80 82]
% size of "as" is [1 80] and size of "bs" is [1 82]
xs = -12:0.1:12;
NX = length(xs);
ys = 0:0.4:12;
NY = length(ys);
total_iterations = NX * NY;
results = zeros(total_iterations , 41 , 16, 8);
XXs = zeros(total_iterations, 1);
YYs = zeros(total_iterations, 1);
parfor idx = 1:total_iterations
[ix, iy] = ind2sub([NX, NY], idx);
x = xs(ix);
y = ys(iy);
term1 = 1./(exp(1/y*(A5D-x)) + 10); %operation 1
to_integrate = B5D.*term1; %operation 2
XXs(idx) = x;
YYs(idx) = y;
results(idx, :, :, :) = trapz(as,trapz(bs,to_integrate,5),4); %operation 3
end
XXs = reshape(XXs, [NX, NY]);
YYs = reshape(YYs, [NX, NY]);
results = reshape(results, [NX, NY, 41, 16, 8]);
clear A5D B5D
save('saved_data.mat','-v7.3');

Respuesta aceptada

Saurabh
Saurabh el 30 de Ag. de 2024
Editada: Saurabh el 30 de Ag. de 2024
It seems like when you are performing some operation on Big Data which is 5D array and size 20GB accessing the university’s computing cluster, you encounter an error.
A heterogenous environment would be a cause of this issue.
The above link is a system requirement of Parallel Server, not “Parallel Computing Toolbox”, but it says an important point;
"Parallel processing constructs that work on the infrastructure enabled by parpool—parfor, parfeval spmd, distributed arrays, and message passing functions—cannot be used on a heterogeneous cluster configuration. The underlying MPI infrastructure requires that all cluster computers have matching word sizes and processor endianness."
The same Information can be found here:
If this is not the case then try changing the "worker" machine to a larger memory per core (in your case each worker will be allocated roughly 3-3.5GB), if this solves the issue, then the "workers" must have had insufficient memory.
If this is the case you can refer to below link, for troubleshooting steps:
I hope this helps.
  1 comentario
Luqman Saleem
Luqman Saleem el 31 de Ag. de 2024
Thank you very much. It was the memory problem. Using the less number of workers worked.

Iniciar sesión para comentar.

Más respuestas (1)

Sam Marshalik
Sam Marshalik el 30 de Ag. de 2024
You are likely running out of memory on the workers. You are not using sliced input variables (Sliced Variables - MATLAB & Simulink (mathworks.com) to access the 5D matrices and are sending the entire copy to each worker. They are likely big enough that you are running out of memory on those machines. I would suggest to run less workers (to give them access to more memory per worker), try using sliced input variables and pass only part of the matrix to the workers, or run on machines with more memory.
To test this theory, you can run your work and monitor memory usage on those machines - if this is the issue, you should see it max out.
  1 comentario
Luqman Saleem
Luqman Saleem el 31 de Ag. de 2024
Thank you. It was the memory problem. Using the less number of workers worked.

Iniciar sesión para comentar.

Categorías

Más información sobre Parallel for-Loops (parfor) en Help Center y File Exchange.

Productos


Versión

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by