Loop through large number of files and access data outside the loop

1 visualización (últimos 30 días)
dipak sanap
dipak sanap el 23 de Nov. de 2015
Comentada: Walter Roberson el 23 de Nov. de 2015
I have to run this code over 50000 files which is around 250 GB so, I am also looking to improve performance.

Respuestas (2)

dipak sanap
dipak sanap el 23 de Nov. de 2015
Editada: Walter Roberson el 23 de Nov. de 2015
numfiles = 50000
for i = 1:numfiles
f = fopen(sprintf('F%d', i), 'r'); %File names are F1, F2 and son on.
X{i} = fscanf(f, '%d %d %f %d',[4, inf]); % Matlab nag about prellocating memory.
X{i} = X{i}';
A{i} = X(:,1:3); %I want to access only three columns of X{i} and make it as A(i) and access it outside loop. i.e make
fclose(f);
end
U = union(A(:,1:2),,'rows'); %Take union of first two colums of all A(i)
for j = 1:numfiles
Ua(j) = setdiff(U(:,1:2), A(:,1:2),'rows');%Again access A(i) from previous loop
Ua_z(j) = [Ua(j) zeros(size(Ua,1),1)]; % Add zero column to Ua(j)
AU(j) = [A(i) ; Ua_z(j)]; % Vertically concatenate A(i) and Ua_z(j)
AU_sorted(j) = sortrows(AU); %Sort rows of AU(j)
end
C = [U(:,1:2), AU_sorted(:,3)]; % Horizontally concatenate U and AU_sorted(j)

Walter Roberson
Walter Roberson el 23 de Nov. de 2015
numfiles = 50000
A = cell{numfiles,1};
for i = 1:numfiles
f = fopen(sprintf('F%d', i), 'r'); %File names are F1, F2 and son on.
A{i} = fscanf(f, '%d %d %f %*d',[3, inf]) .';
fclose(f);
end
You do not use column 4, so tell fscanf that it exists but that no value is to be returned for it. With this done you do not need the temporary variable X.
  3 comentarios
Walter Roberson
Walter Roberson el 23 de Nov. de 2015
In your existing code, you were creating X temporarily and using A outside the loop. Now you say that you got rid of A and want to use X outside the loop. Your existing code does not use X outside the loop, only A.
Walter Roberson
Walter Roberson el 23 de Nov. de 2015
U = union( cell2mat(cellfun(@(M) M(:,1:2), A(:), 'Uniform', 0)), 'rows');
However, I would suggest
U = unique( cell2mat(cellfun(@(M) unique(M(:,1:2), 'rows'), A(:), 'Uniform', 0)), 'rows');
However, if you know that those rows are unique within each file, then you might as well use the first command. If those rows are not unique within each file then you can save a lot of memory by taking the unique values by file before merging them all together.

Iniciar sesión para comentar.

Categorías

Más información sobre Loops and Conditional Statements en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by