I have a large number (> 1E6) of ASCII files (myFile.txt) which contain time series data, all in the same format: timestamp, field 1, field 2,...,field 20. Each data entry is one row, tab separated. Each of the fields 2-20 is a double. The timestamp is string (HH:MM:SS.FFF). The files are each c. 5GB in size.
I wish to reduce the hard disk storage required. How can I do this?
My thoughts so far are
1. Convert the files to binary format. How can I do this? Is it by applying dec2bin.m? However this function seems to only take scalars. What would this look like?
2. Compress each file. Each file may be used independently of the others, thus I wish to compress individually. I know that differing approaches to compression work differently for different data structures. Given my data structure above, which is the best one to apply?
Given the importance of this, I would be happy calling other language files from inside matlab (eg C++). Any standard libraries/ third party tools that can be recommended?
3. Any other suggestions?
Finally, an important point is that I wish the user to be able to quickly load and access the data in each file - ie the bin2dec() call must be quick as must be the decompression.