Quicker way from for loop for reading columns from different csv files in the same folder
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Hello to everyone,
I have a folder that entails a large number of files lets say 10600 csv files (useful_stator_files). Each csv file entails a large number of columns (about 100 lets say) the length of rows is variable from 10 to 60. I am using the code under:
for stupid_k=1:(length(useful_source_files_stator))
final_path_stator{stupid_k}= useful_source_files_stator(stupid_k).name; %take the name of the final path depending on the csv files i kept
[SIR_AVG, SIR_MIN, SIR_MAX] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'RES_INS_STA_LOG_AVG_PRI','RES_INS_STA_LOG_MIN_PRI','RES_INS_STA_LOG_MAX_PRI'}, 'noHeader', false, 'delimiter', ',' ); % READ THE 3 COLUMNS FROM EACH USEFUL FILE
[SPEED_AVG, SPEED_MIN, SPEED_MAX] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'SPD_ACT_LOG_AVG_PRI','SPD_ACT_LOG_MIN_PRI','SPD_ACT_LOG_MAX_PRI'}, 'noHeader', false, 'delimiter', ',' ); % READ THE 3 COLUMNS FROM EACH USEFUL FILE
[Date_Time] = csvimport(sprintf('%s%s', source_dir,'\',final_path_stator{stupid_k}), 'columns', {'Date_Time_ms'}, 'noHeader', false, 'delimiter', ',' ); % READ COLUMN FROM EACH USEFUL FILE
Big_SIR_AVG{:,stupid_k}= SIR_AVG; % update big matrix
Big_SIR_MIN{:,stupid_k}= SIR_MIN; % update big matrix
Big_SIR_MAX{:,stupid_k}= SIR_MAX; % update big matrix
Big_SPEED_AVG{:,stupid_k}= SPEED_AVG; % update big matrix
Big_SPEED_MIN{:,stupid_k}= SPEED_MIN; % update big matrix
Big_SPEED_MAX{:,stupid_k}= SPEED_MAX; % update big matrix
Big_Date_Time{:,stupid_k}= Date_Time;
end
I have a stable path(source dir) and a path that changes(final path), i get inside each file and i get the columns i want and i finally keep them in cell arrays since they are other double vectors or string vectors. CSV import function, i took it from here:
It works but all this needs a lot of time, i also am taking another 4-5 signals apart from the 7 that i wrote in the code but that is the idea.
0 comentarios
Respuestas (1)
Guillaume
el 4 de Dic. de 2015
Parsing 10600 text files is always going to be slow, particularly on Windows which probably struggles with that many files in a single directory. File I/O is probably the major bottleneck in what you're doing and there's not much you can do about it short of using a more efficient form of storage for your data.
Parsing the same files three times (three calls to csvimport per file) is certainly not going to help. There's no guarantee that csvimport code has been written optimally either (certainly after a quick look, the file reading part isn't efficient). You would be much better off using csvread (comes with matlab) only once per file and doing the splitting into individual columns yourself (assuming that this step is even necessary)
Preallocating your Big_* cell arrays would also help marginally.
2 comentarios
Guillaume
el 4 de Dic. de 2015
Whichever function you use, the biggest and simplest speed up you can make is to read each file once instead of three times. So ask for your columns all at once rather than in three different calls to the reading function.
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!