Big data processing, Datastore function

2 visualizaciones (últimos 30 días)
Vincent Thevenot
Vincent Thevenot el 31 de Mayo de 2015
Respondida: Aaditya Kalsi el 1 de Jun. de 2015
Hi,
I have to deal with big files with 3 tabular spaced columns. But I’m out of memory, the files contain several millions of rows. So I try to use "datastore" function, and it works very well, but Matlab return an error when the file contains more than 594000 rows.
Here is the message :
Error using matlab.io.datastore.TabularTextDatastore/read (line 41)
The data in Files does not appear to be tabular, with the same number of fields in each row and in each column. Verify the Text Format and Advanced Text Format
Properties.
Error in test_datastore (line 17)
s=read(ds);
It seems to be a problem with the format, but I tried with different part of the file, and Matlab always return this message if there is more than 594000 rows.
Here is my code (very simple, just to test the function) :
ds=datastore('essai_RM7_1_test_3.txt','ReadVariableNames',0,'TextscanFormats',{'%q','%f','%f'},'RowDelimiter',' ');
ds.RowsPerRead = 100000;
count = 0;
while hasdata(ds)
s=read(ds);
count = count + 1
end
count
Here is some rows of the file :
24/04/2015 09:58:06.220351 -1.143072E-2 1.277841E-1
24/04/2015 09:58:06.220957 2.736964E-3 9.289337E-2
24/04/2015 09:58:06.221562 -7.244674E-3 3.169246E-2
24/04/2015 09:58:06.222167 2.487282E-2 -6.050338E-2
24/04/2015 09:58:06.222773 1.344811E-1 -1.312878E-1
24/04/2015 09:58:06.223378 7.464026E-2 -1.944335E-1
24/04/2015 09:58:06.223984 -6.966816E-2 -2.088179E-1
24/04/2015 09:58:06.224589 -5.196927E-2 -1.842140E-1
24/04/2015 09:58:06.225195 6.998909E-2 -1.819939E-1
So, does anybody encountered this kind of problem ? Is there a different way to deal with such a big file ? I have to perform different calculus (FFT, RMS, …)
Thanks in advance for your help

Respuestas (1)

Aaditya Kalsi
Aaditya Kalsi el 1 de Jun. de 2015
It seems like there is an issue with the data within the file at around row 594000. You could try:
while hasdata(ds)
[s, info]=read(ds);
disp(info); % DISPLAY CURRENT STATE
count = count + 1
end
This will tell you where was the last successful read.
I have a suspicion that the second file is different from the first and that is the error you are seeing.

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by