Huge number of iterations

17 visualizaciones (últimos 30 días)
Qammar Abbas
Qammar Abbas el 20 de Sept. de 2021
Editada: Qammar Abbas el 22 de Sept. de 2021
Hi Community members,
I am generating chemical formulas of compounds by forming combinations of elements and storing them in text file. The total number of combinations according to my calculations come out to be 18,217,382,400 i.e. i need 18,217,382,400 number of for loop iterations. I want to do this as quicky as possible. Please suggest an efficient method for doing this. I have tried both for and parfor, they take too long. A snippet of my code is shown below. I am using 2 workers and the code has been running for more than 24 hours now. How can I improve speed?
fcn = @() fopen( sprintf( 'chem_%d.txt', labindex ), 'wt' );
w = WorkerObjWrapper( fcn, {}, @fclose );
iterations=[length(a) length(b)]; % a and b are cell arrays. Length of a is 10944 length of b is 1664600
tic
parfor ix=1:prod(iterations)
ix
[d,e]=ind2sub(iterations,ix);
fprintf(w.Value, '%s\n', strcat(a{d},b{e}));
end
toc
clear w;
  6 comentarios
Qammar Abbas
Qammar Abbas el 21 de Sept. de 2021
This is something I can't share. However, I can tell you that it is a necessary requirement.
Rik
Rik el 21 de Sept. de 2021
Then you should probably consider buying computation time on some sort of cluster. If you don't tell us what you want to do, we can't suggest a way to avoid some of the computational work. Things take time. Sometimes the most efficient way is to reduce the number of things.

Iniciar sesión para comentar.

Respuestas (1)

Walter Roberson
Walter Roberson el 20 de Sept. de 2021
fcn = @() fopen( sprintf( 'chem_%d.txt', labindex ), 'wt' );
w = WorkerObjWrapper( fcn, {}, @fclose );
% a and b are cell arrays. Length of a is 10944 length of b is 1664600
b = b(:);
tic
iterations = length(a);
parfor ix=1:iterations
outs = strjoin(strcat(a(ix), b, {newline})); %a(ix) is deliberate in case a{ix} has whitespace
fwrite(w.Value, outs);
end
toc
  4 comentarios
Walter Roberson
Walter Roberson el 22 de Sept. de 2021
Huh. I really expected the fprintf version would be slower !
Notice that I build the fprintf format dynamically to include the current content from a . I assumed here that a does not contain any % characters.
NA = 100;
NB = 10000;
letters = ['A':'Z', '0':'9']; nlet = length(letters);
maxword = 5;
a = arrayfun(@(L) letters(randi(nlet, 1, L)), randi([1, maxword], 1, NA), 'uniform', 0);
b = arrayfun(@(L) letters(randi(nlet, 1, L)), randi([1, maxword], 1, NB), 'uniform', 0);
tn = tempname();
cleanME = onCleanup(@() delete(tn));
t1 = timeit(@() use_fprintf(tn, a, b), 0);
use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000
t2 = timeit(@() use_strjoin(tn, a, b), 0);
use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000
t3 = timeit(@() use_horzcat(tn, a, b), 0);
use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000
struct('fprintf', t1, 'strjoin', t2, 'horzcat', t3)
ans = struct with fields:
fprintf: 0.5896 strjoin: 2.6799 horzcat: 2.4606
function use_fprintf(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
fmt = sprintf('%s%%s\\n', a{K});
fprintf(fid, fmt, b{:});
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_fprintf bytes = %d\n', dinfo.bytes);
end
function use_strjoin(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
outs = strjoin(strcat(a(K), b, {newline}), '');
fwrite(fid, outs);
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_strjoin bytes = %d\n', dinfo.bytes);
end
function use_horzcat(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
temp = strcat(a(K), b, {newline});
outs = [temp{:}];
fwrite(fid, outs);
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_horzcat bytes = %d\n', dinfo.bytes);
end
Qammar Abbas
Qammar Abbas el 22 de Sept. de 2021
Editada: Qammar Abbas el 22 de Sept. de 2021
I have tried your first code and as @Benjamin explained, indeed it is a very good solution to my problem. However, I observed that the execution time further reduces if we use 'for' instead of 'parfor' in your first code. According to my calculation, I need maximum of 2 days to generate all 18,217,382,400 combinations using for loop. I have started running the code and will get back to you with the results in 2-3 days hopefully. Meanwhile, I am trying to understand the second code you have shared. I am thankful for your help.

Iniciar sesión para comentar.

Categorías

Más información sobre Loops and Conditional Statements en Help Center y File Exchange.

Productos


Versión

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by