How to read a large text file quickly ( exceeding 20GB)
Mostrar comentarios más antiguos
I have a large text file with 3000000 rows and 1200 columns. I have split this into 15 files of 200000 rows each. But even the smaller files are taking huge time when using dlmread to read them. Is there a way to read these files much faster ??
will load/textscan be of any help than dlmread ??
Also is there a way to read the original file with 3000000 rows directly without splitting into smaller files ??
Respuestas (1)
Titus Edelhofer
el 25 de Jun. de 2014
0 votos
Hi Sivanand,
you will be able to read the original file using fopen and textscan, because you can use textscan in a loop to read chunks of the file (e.g. 100000 lines per iteration). Some questions:
- 1200 columns is of course a lot... do you need all of them? If not, use %*f to skip e.g. a column with numbers
- load: will not be faster than dlmread
In any case: it will not be really fast, and since the file is that large, it will take up a considerably amount of memory as well. Does the text file change? Otherwise I would suggest to read the data once (and don't worry too much about the time) and then save in binary format using save. The next time it will be significantly faster to read.
Titus
3 comentarios
Sivanand
el 26 de Jun. de 2014
Titus Edelhofer
el 26 de Jun. de 2014
Hi,
that would be something like
fid = fopen('largedata.txt', 'rt');
formatString = repmat('%f,', 1, 1200);
formatString (end) = [];
allData = zeros(0, 1200);
while ~feof(fid)
data = textscan(fid, formatString, 100000);
allData = [allData; [data{:}]];
end
fclose(fid);
If you know before how many lines you have, you should of course preallocate allData instead of concatenating.
Titus
Ken Atwell
el 27 de Jun. de 2014
Do you have enough memory to do all of this? 3000000x1200x8 is something like 30 GB of physical memory to hold the matrix. Plus you need more free memory to be able to perform any calculations.
Categorías
Más información sobre Standard File Formats en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!