read small selection of data from large file
Mostrar comentarios más antiguos
I have several large .csv files (up to around 8GB, the arrays have about 10^5 rows and up to 15k columns) from which I would like to read data. Most of these read operations will only be pulling from 1000 to 10000 data points at a time (generally it will be just a single row of data or a subset of a row). However, it seems like dlmread is doing something inefficiently since each read operation is taking several minutes. Is there a lower-level read function which can do this significantly faster (really, it needs to be orders of magnitude faster; even a 2x increase in speed isn't going to cut it)? Should I use another format for the data? I thought about building a mySQL database for it but I have no experience with this. Is Matlab even the right environment to be doing this sort of thing? Thanks in advance.
Josh
Respuesta aceptada
Más respuestas (1)
Ashish Uthama
el 25 de Mayo de 2011
0 votos
If you have the option to change the source or if you plan to use the data over and over again, it might be best to change the format to plain binary. i.e use fwrite to write it out as doubles rather than a text format like csv. (Unless, of course, each line has a varying number of entries and that this structure is integral to your data.)
This would probably be the fastest, since it is the most simplest. Also, the file size might be smaller.
You will be able to index into the file to read subset more easily. You can easily compute the offset to the (i,j)th element since you know exactly how much space a single double will take in a binary file.
2 comentarios
Walter Roberson
el 25 de Mayo de 2011
That's a good idea.
Josh Warren
el 25 de Mayo de 2011
Categorías
Más información sobre Database Toolbox en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!