load specified column in matfile too slow
Mostrar comentarios más antiguos
I have a matfile with size of: 5e6*50.
I want to write a code to load the specific column into my memory, but I found that the time reading a specified column is nearly the same with that reading the whole matfile.
below is the test code:
A=rand(5e6,50);
save A A
f=matfile('A');
tic
tmp=f.A;
toc
tic
tmp=f.A(:,1);
toc
is there anyway to improve the performance?
Thanks!
Yu
2 comentarios
Walter Roberson
el 3 de Oct. de 2018
I notice you did not specifically save with -v7.3, so you might be getting a -v7 file.
When I test on my system with -v7.3, selecting one column comes out roughly 10% faster. Not as good as one might hope, though.
Yu Li
el 3 de Oct. de 2018
Respuestas (1)
Walter Roberson
el 3 de Oct. de 2018
In this case, you can do much better by using -nocompression
Save -v7.3 -nocompression
Elapsed time is 17.814604 seconds.
Done save -v7.3 -nocompression
Start matfile -v7.3 -nocompression
Elapsed time is 0.016278 seconds.
Done matfile -v7.3 -nocompression
Start recall entire variable -v7.3 -nocompression
Elapsed time is 2.195975 seconds.
Done recall entire variable -v7.3 -nocompression
Start recall one column -v7.3 -nocompression
Elapsed time is 1.089280 seconds.
Done recall one column -v7.3 -nocompression
Save -v7.3
Elapsed time is 58.543461 seconds.
Done save -v7.3
Start matfile -v7.3
Elapsed time is 0.077814 seconds.
Done matfile -v7.3
Start recall entire variable -v7.3
Elapsed time is 10.139135 seconds.
Done recall entire variable -v7.3
Start recall one column -v7.3
Elapsed time is 9.118167 seconds.
Done recall one column -v7.3
Source code:
A=rand(5e6,50);
time_it(A, {'-v7.3' '-nocompression'})
time_it(A, {'-v7.3'})
function time_it(A, saveoptions)
savedesc = strjoin(saveoptions, ' ');
fprintf('Save %s\n', savedesc);
tic
save('A', 'A', saveoptions{:});
toc
fprintf('Done save %s\n', savedesc);
fprintf('Start matfile %s\n', savedesc);
tic
f = matfile('A');
toc
fprintf('Done matfile %s\n', savedesc);
fprintf('Start recall entire variable %s\n', savedesc);
tic
tmp=f.A;
toc
fprintf('Done recall entire variable %s\n', savedesc);
fprintf('Start recall one column %s\n', savedesc);
tic
tmp=f.A(:,1);
toc
fprintf('Done recall one column %s\n', savedesc);
end
7 comentarios
Walter Roberson
el 3 de Oct. de 2018
Looks like even more of an improvement is to access by rows. Times about 0.013 seconds even for the compressed version.
A=rand(5e6,50);
time_it(A, {'-v7.3' '-nocompression'})
time_it(A, {'-v7.3'})
function time_it(A, saveoptions)
savedesc = strjoin(saveoptions, ' ');
fprintf('Save %s\n', savedesc);
tic
save('A', 'A', saveoptions{:});
toc
fprintf('Done save %s\n', savedesc);
fprintf('Start matfile %s\n', savedesc);
tic
f = matfile('A');
toc
fprintf('Done matfile %s\n', savedesc);
fprintf('Start recall entire variable %s\n', savedesc);
tic
tmp=f.A;
toc
fprintf('Done recall entire variable %s\n', savedesc);
fprintf('Start recall one column %s\n', savedesc);
tic
tmp=f.A(:,1);
toc
fprintf('Done recall one column %s\n', savedesc);
fprintf('Start recall one row %s\n', savedesc);
tic
tmp=f.A(1,:);
toc
fprintf('Done recall one row %s\n', savedesc);
fprintf('\n');
end
Yu Li
el 3 de Oct. de 2018
Walter Roberson
el 3 de Oct. de 2018
Editada: Walter Roberson
el 3 de Oct. de 2018
See my modification to work by rows. Though now that I think of it, a row is only 50 items. Hmmmm, more testing needed.
Yu Li
el 3 de Oct. de 2018
Walter Roberson
el 3 de Oct. de 2018
Version #3 adds per-point timing (that is, cost per matrix element retrieved). It times both A and A.' to try to account for differences in row / column sizes. Taking into account the different sizes, the summary seems to be that for this data, -nocompression is much faster, and that when compression is on, reading a row might be slightly faster than reading a column but there is probably not enough difference to make it worth rewriting code for rows vs columns
A=rand(5e6,50);
time_it(A, {'-v7.3' '-nocompression'}, '7.3 no compression')
time_it(A, {'-v7.3'}, '7.3 w/ compression')
Atrans = A.';
time_it(Atrans, {'-v7.3' '-nocompression'}, 'flipped 7.3 no compression')
time_it(Atrans, {'-v7.3'}, 'flipped 7.3 w/ compression');
function time_it(A, saveoptions, savedesc)
fprintf('Save %s\n', savedesc);
tic
save('A', 'A', saveoptions{:});
toc
fprintf('Done save %s\n', savedesc);
fprintf('Start matfile %s\n', savedesc);
tic
f = matfile('A');
toc
fprintf('Done matfile %s\n', savedesc);
fprintf('Start recall entire variable %s\n', savedesc);
tic
tmp=f.A;
t = toc;
fprintf('%f seconds elapsed, %g per point\nDone recall entire variable %s\n', t, t/numel(tmp), savedesc);
fprintf('Start recall one column %s\n', savedesc);
tic
tmp=f.A(:,1);
t = toc;
fprintf('%f seconds elapsed, %g per point\nDone recall one column %s\n', t, t/numel(tmp), savedesc);
fprintf('Start recall one row %s\n', savedesc);
tic
tmp=f.A(1,:);
t = toc;
fprintf('%f seconds elapsed, %g per point\nDone recall one row %s\n', t, t/numel(tmp), savedesc);
fprintf('\n');
end
Yu Li
el 3 de Oct. de 2018
Walter Roberson
el 3 de Oct. de 2018
You can refer them to this post and my test code.
One thing they are likely to point out is that the default of compression is intended for "real" data, not for rand(), and that when you read/write with compression, the performance would be expected to vary with how compressible the data is. Thus you should probably run this code with the rand() replaced by load of one of your actual matrices.
Categorías
Más información sobre Whos en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!