MATLAB functions str2double and strsplit taking long time

5 visualizaciones (últimos 30 días)
Arup Ghosh
Arup Ghosh el 9 de En. de 2018
Editada: Stephen23 el 9 de En. de 2018
I have a comma separated string file like this:
number1, number2, number3, number4
number5, number6, number7, number8
number9, number10, number11, number12
.....................................
_[all the numbers are double]_
I am reading it line by line.
Then splitting the line into parts (',' is the delimiter) and converting the parts into double numbers.
The code is given below:
str2double(strsplit(line,','));
The input file is very big. It has >100000 lines and each line has >200 parts or numbers.
The Profiler shows the above code is taking long time to execute.
How to replace the above code thus it takes very short time to execute?
I want to read the file line by line. Do not want to read the whole file into a single Matrix using csvread.
Thanks in advance.
  2 comentarios
Stephen23
Stephen23 el 9 de En. de 2018
Why not just use csvread? If the file is comma separated and contains only numbers, then why waste time writing buggy code when csvread already exists?
Arup Ghosh
Arup Ghosh el 9 de En. de 2018
The file is very big. I want to work on line by line (not based on whole matrix). So, reading complete file in one go is basically useless.

Iniciar sesión para comentar.

Respuesta aceptada

Stephen23
Stephen23 el 9 de En. de 2018
Editada: Stephen23 el 9 de En. de 2018
The approach of reading each line and then using strsplit and str2double will be slow, because those functions are inherently much more complex than what you require.
method one: sscanf:
One simpler, much faster alternative would be to use sscanf, which may be enough for your needs:
fmt = repmat(',%f',1,N); % N == number of columns
fmt = fmt(2:end);
...
while ~feof(fid)
...
S = fgetl(fid);
V = sscanf(S,fmt);
end
method two: textscan and blocks of data:
One way to read a large file efficiently is to use textscan inside a loop to read blocks of numeric data as numeric data (importing as char and then converting to numeric is, in general, slow code). Use textscan's optional third input to specify the number of rows per block. How to read blocks of data is explained clearly in the MATLAB documentation:
If you know the file format in advance then it is trivial to write a format string to suit. If the number of columns can vary then you can read the first line, calculate the columns to generate the format string, then use frewind to go back to the start of the file and start reading the blocks of data using textscan.
See these threads to see working examples:
method three: datastore:
Depending on your MATLAB version you might also like to consider using tall arrays, which are a special kind of data type especially for working with very large data that cannot be read into memory:
or methods like datastore for working with large files:

Más respuestas (0)

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by