What is the optimal block size for importing a big .csv with textscan()

2 visualizaciones (últimos 30 días)
I have .csv files as big as ~17GB and limited RAM (to 8GB), therefore I am importing them in blocks. I noticed that importing as much as I can (thus fewer iterations) is not optimal.
Follows the test on Win7 64 i7-2600 R2013a:
n = 50;
opt = {'Delimiter',',','HeaderLines',1};
N = [500 1e3 5e3 1e4:1e4:1e5 2e5:1e5:1e6];
t = zeros(n,numel(N));
for jj = 1:23;
disp(N(jj))
fid = fopen('C:\TAQ\8e1e9fb052f2b2b6.csv');
for ii = 1:n
tic
foo = textscan(fid, '%s%u32%u8:%u8:%u8%f32%u32%u16%u16%s%c', N(jj), opt{:});
t(ii,jj) = toc;
end
fclose(fid);
end
The results (y-seconds, x-number of lines imported):
QUESTION: Do you find these results unusual, and what might cause the substantial increase after 1e5? I/O buffer?
Note: consider that 1e6 lines is around ~40MB.

Respuestas (0)

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by