Quicker way from for loop for reading columns from different csv files in the same folder

Question

Christos Antonakopoulos el 4 de Dic. de 2015

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/258893-quicker-way-from-for-loop-for-reading-columns-from-different-csv-files-in-the-same-folder

Comentada: Guillaume el 4 de Dic. de 2015

Hello to everyone,

I have a folder that entails a large number of files lets say 10600 csv files (useful_stator_files). Each csv file entails a large number of columns (about 100 lets say) the length of rows is variable from 10 to 60. I am using the code under:

     for stupid_k=1:(length(useful_source_files_stator))
     final_path_stator{stupid_k}=  useful_source_files_stator(stupid_k).name;  %take the name of the final path depending on the csv files i kept 
    [SIR_AVG, SIR_MIN, SIR_MAX] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'RES_INS_STA_LOG_AVG_PRI','RES_INS_STA_LOG_MIN_PRI','RES_INS_STA_LOG_MAX_PRI'}, 'noHeader', false, 'delimiter', ',' ); % READ THE 3 COLUMNS FROM EACH USEFUL FILE
    [SPEED_AVG, SPEED_MIN, SPEED_MAX] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'SPD_ACT_LOG_AVG_PRI','SPD_ACT_LOG_MIN_PRI','SPD_ACT_LOG_MAX_PRI'}, 'noHeader', false, 'delimiter', ',' ); % READ THE 3 COLUMNS FROM EACH USEFUL FILE
    [Date_Time] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'Date_Time_ms'}, 'noHeader', false, 'delimiter', ',' ); % READ COLUMN FROM EACH USEFUL FILE
              Big_SIR_AVG{:,stupid_k}= SIR_AVG; % update big matrix 
             Big_SIR_MIN{:,stupid_k}= SIR_MIN; % update big matrix  
              Big_SIR_MAX{:,stupid_k}= SIR_MAX;  % update big matrix  
              Big_SPEED_AVG{:,stupid_k}= SPEED_AVG; % update big matrix
             Big_SPEED_MIN{:,stupid_k}= SPEED_MIN; % update big matrix
             Big_SPEED_MAX{:,stupid_k}= SPEED_MAX;  % update big matrix 
            Big_Date_Time{:,stupid_k}= Date_Time;
end

I have a stable path(source dir) and a path that changes(final path), i get inside each file and i get the columns i want and i finally keep them in cell arrays since they are other double vectors or string vectors. CSV import function, i took it from here:

http://www.mathworks.com/matlabcentral/fileexchange/23573-csvimport

It works but all this needs a lot of time, i also am taking another 4-5 signals apart from the 7 that i wrote in the code but that is the idea.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Guillaume el 4 de Dic. de 2015

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/258893-quicker-way-from-for-loop-for-reading-columns-from-different-csv-files-in-the-same-folder#answer_202213

Parsing 10600 text files is always going to be slow, particularly on Windows which probably struggles with that many files in a single directory. File I/O is probably the major bottleneck in what you're doing and there's not much you can do about it short of using a more efficient form of storage for your data.

Parsing the same files three times (three calls to csvimport per file) is certainly not going to help. There's no guarantee that csvimport code has been written optimally either (certainly after a quick look, the file reading part isn't efficient). You would be much better off using csvread (comes with matlab) only once per file and doing the splitting into individual columns yourself (assuming that this step is even necessary)

Preallocating your Big_* cell arrays would also help marginally.

2 comentarios
Mostrar NingunoOcultar Ninguno

Christos Antonakopoulos el 4 de Dic. de 2015

Thank you,

Yes the preallocating is done already. Regarding the csvimport function it is needed since i have many string values inside, csvread if i am not wrong does not work. Unfortunately, i can not change the way the data are stored and taken.

Guillaume el 4 de Dic. de 2015

Whichever function you use, the biggest and simplest speed up you can make is to read each file once instead of three times. So ask for your columns all at once rather than in three different calls to the reading function.

If csvread does no work, other options are textscan, which requires a bit more work on your part (you have to open and close the file yourself) or readtable which is dead simple to use but comes with the overhead of tables.

Or you could just parse each file yourself with regexp as I showed you in one of your questions.

Iniciar sesión para comentar.

Quicker way from for loop for reading columns from different csv files in the same folder

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Quicker way from for loop for reading columns from different csv files in the same folder

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuestas (1)

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno