Large streaming data direct to file
4 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hi!,
I would like to setup a system to log months’ worth of financial json websocket data to a file.
- The json data coming in looks like this {"this": "that", "foo": [1,2,3], "bar": ["a", "b", "c"]}, and there is about 20 message per second.
- I did tests with FPRINTF writing directly to a .txt file. That works but the files get really big 2gb per day. Because there is not compression.
- I tested different SAVE formats ( '-v7' being by far the best) to save a new variable inside a .mat file every 10 mins. This was a little too slow to keep up with the stream of data coming in. Taking almost a second to save every 10 mins and it wouldn't be ideal to process it if I have to load a ton of different variables. But the file size looked to be very good. (http://undocumentedmatlab.com/blog/improving-save-performance)
- I tried the MATFILE declaration to write directly to file. But only could adjoin to the end of a file with '-v7.3' .mat files. Which makes the file a lot bigger then ‘-v7’ and still takes a little too long.
- I would like to have a file that uses good compression that I can write a new message to fast. Maybe HDF5 file format.?
I believe I need to serialize the data coming in and save it directly to a file in some kind of compressed way. But I'm not exactly sure how to do that.
- I read through this article and don't get exactly how to implement it. ( https://undocumentedmatlab.com/blog/serializing-deserializing-matlab-data). Since this is older article is there a more up to date way.
- Do I use something like "h5write"? "getByteStreamFromArray"?
- After the file is created with months of data. How do I pull each message, one by one, to process it?
- Is this "Fast serialize/deserialize" in the file exchange the correct path?... I can't figure out how to use it.
Thank you!
Joe
0 comentarios
Respuestas (1)
Jan
el 16 de Nov. de 2018
Editada: Jan
el 16 de Nov. de 2018
You can create the text as chat vector by sprintf instead of fprintf and compress it in the RAM before writing them to disk: https://www.mathworks.com/matlabcentral/fileexchange/69388-mkzip . This should avoid the overhead of compressed MAT files.
Maybe it is just the disk access, which slows down the processing. Then try to use a SSD instead.
1 comentario
Ver también
Categorías
Más información sobre Text Files en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!