How to extract data from .txt file that has both text and numerical data?

15 visualizaciones (últimos 30 días)
Naveen
Naveen el 17 de Dic. de 2015
Respondida: D. Ali el 27 de Abr. de 2019
I have data that I want to extract from multiple txt files. Each txt file is organized in the same way, as seen in the image attached below. I am not interested in the text or headers. I only want to pull out the numerical data in the three columns labeled "A-FRQ", "FRQ-C", and "P-FRQ".
To clarify specifically what I want to do with the data: I have hundreds of these .txt files in a folder. I want to run a loop that combines the columns of data for all of these .txt files. So by the end of the process, I want three huge arrays in MATLAB (A-FRQ, FRQ-C, and P-FRQ) that contain those respective values for EVERY data file in the folder. Note, that the size of each individual data .txt file is different. In the picture below, that dataset has 12 rows I'm interested in extracting. In another file, it might be 75 rows of data I'm interested in.
My end goal is three separate histograms for the categories (A-FRQ, FRQ-C, P-FRQ) that account for the data in all the files.
Sorry, I guess I don't really have a specific question since I'm just about to dive into this task right now. I just figured I'd post my problem ahead of time in hope someone can provide tips or advice on how I can efficiently accomplish this conceptually simple task. I can already imagine that my biggest issue will be how to read the .txt file in MATLAB and then extract the columns of data that I am interested in without caring about the headers/text in the file.
  1 comentario
jgg
jgg el 17 de Dic. de 2015
Hopefully your files all look pretty similar, because this will make things a lot easier. You will probably want the textscan command to do this.
I would start by getting one of your files to textscan in properly, then automate the loop, probably using the fullfile and dir commands to automate looping through your files.

Iniciar sesión para comentar.

Respuestas (2)

Ingrid
Ingrid el 18 de Dic. de 2015
this is not so difficult to achieve
just define the appropriate HeaderLines as optional argument to textscan and it should work fine if you do something like this
listing = dir(nameFolder);
N = numel(listing);
data = [];
for ii = 3:N
fid = fopen(listing{ii}));
newData = textscan(fid,'%*f%f%f%f,'HeaderLines",7);
data = [data; newData];
fclose(fid);
end
  2 comentarios
Guillaume
Guillaume el 18 de Dic. de 2015
The only thing I would change from Ingrid's answer is the allocation / resizing of data on each file. Instead I'd store each newData matrix into a cell array, and do the concatenation in one go at the end.
This should both be more memory and time efficient:
listing = dir(nameFolder);
N = numel(listing);
data = cell(1, N);
for ii = 3:N
fid = fopen(listing{ii}));
data{ii} = textscan(fid,'%*f%f%f%f,'HeaderLines",7); %read into cell of cell array
fclose(fid);
end
data = vertcat(data{:}); %concatenate all the cells. Only one reallocation instead of hundreds
Ingrid
Ingrid el 21 de Dic. de 2015
thanks Guillaume for this useful tip. I knew it was going to change the size of the matrix each run but did not know how to solve it when you do not know the size of the matrix beforehand. This is a simple but effective solution.

Iniciar sesión para comentar.


D. Ali
D. Ali el 27 de Abr. de 2019
I have similar question where I need to extarct all MCAP evetns with time they occured on in separat file and plot if possilbe
I attached the file

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by