Reading csv files ignoring header info and text in between

Question

Iroquois Pliskin el 30 de Mzo. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/788274-reading-csv-files-ignoring-header-info-and-text-in-between

Comentada: Iroquois Pliskin el 1 de Ag. de 2021

Respuesta aceptada: Allen

HI,

I have two types of CSV files which i extract from a measurement instrument.

The first one(Dirty_Complex_Condensed.csv) , attached with this question has

Header text data which i would like to ignore(lines 1-5)
Some integers which i would like to not ignore(lines 6-34)
Complex decimal data which i would like to not ignore(lines 21-35) or really need.(measurement)

The second one(Dirty_Supplement_Condensed.csv) also attached has

Header text data which i would like to ignore(lines 1-5)
Some integers which i would like to not ignore(lines 6-34) or really need(measurement)
Header text data which i would like to ignore(lines 21-27)
Some integer which i would like to not ignore(lines 9-34) and append to the earlier integer data read(lines 6-34).(measurement)

Also note that the location(line number) of these header and measurement data is not always constant because it depends on the type of conditions of the data being mesaured. So how do i take that consideration and make my code flexible so that i am not hard-coding the 'row' or 'line number' for different csv read in?

I am attaching 'Clean_Complex_Condensed.csv' and 'Clean_Supplement_Condensed.csv' for your reference on how i would like the data be eventually as i read them in. The number of rows in the imported 'Cleaned' version of the csv's should always be the same between the two files.

Currently, i am manually going in and deleting these header text data and then using 'csvread' command , but undoubtedly, this is too time consuming.

I have checked the following previous links which are relevent but not exactly or quite close to what am trying to accomplish here.

https://www.mathworks.com/matlabcentral/answers/281466-csvread-file-containing-text

https://www.mathworks.com/matlabcentral/answers/161434-reading-only-numeric-data-in-a-text-file

https://www.mathworks.com/matlabcentral/answers/415836-how-to-use-textscan-to-read-my-2nd-column-and-ignore-the-string-or-non-numerical-values?s_tid=ta_ans_results

I guess the most powerful way is to use 'regex' but before i break my head further i wanted to know if any of the clever cogs MVPs /MathWorks staff had any fast and bright suggestions for a relatively simple question i have.

Thanks,

IP

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Allen el 25 de Mayo de 2021

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/788274-reading-csv-files-ignoring-header-info-and-text-in-between#answer_708855

Abrir en MATLAB Online

Iroquois,

The line numbers that you are referencing do not quite match up with your descriptions, and is making your needs a bit hard to decipher. However, to possibly get you started I have provided the following snippets.

1st: Start by reading in a portion of the data to enough lines to capture all of the splits in data type. This is quicker than reading the entire file, especially for large data files. The example below assumes that the last change in data type will occur before line 100 (corresponds to row number in Excel).

% Opens the *.csv file and read the first 100 lines only
file = "path\filename for your data file";
fid = fopen(file);
C = textscan(fid,"%s",100,"delimiter","\n"); % Should be a cell-array of cells. Will only need C{1}

C{1} should be a cell-array containing lines of text, which also includes numerical values as text.

2nd: Run a number of searches on C{1} to find the row indices where data changes to the type(s) you want and the type(s) you do not want. The following example looks for and returns line number that contain the text "DataLabelIdxs". Then looks for the first line containing single column values. In the Dirty_Complex_Condensed.csv file you provided this finds the first line after your integer values.

txt = "DataLabelIdxs";
index1 = find(contains(C{1},txt),1);
% Last integer line from this example dataset is also the last line to not
% contain a comma.
index2 = index2 = find(~contains(C{1}(index1+1:end),","),1,"last");

Applying additional searches will help you find other text lines below this and when you change from integer to floating doubles, but will be a bit more complicated. You may also want to program what to do if a search returns an empty value.

3rd: Rewind the FID line counter and use your indices to read the sets of values you wish to keep. The following simply reads in values up to the line as integers starting after the header lines marked by index1

frewind(fid)
% Converts the cell-arrays of numbers that are being read to numeric-arrays
intValues = cell2mat(textscan(fid,"%d",index2,"headerlines",index1));
dblValues = cell2mat(textscan(fid,"%f%f"));
% Close FID
fclose("all");

You may need to use the frewind function multiple times depending on how you generate your indices, or if you are reading all integer values first and have floating double values between sets of your integers.

Also, if it is possible that the floating double lines may contain more than two columns of data and again is a variable number of columns you can use the following to generate a variable-sized expression for textscan.

% Assuming index is a row along the data containing the floating double
% values.
expr = strcat("%f",repmat("%f",[1,sum(C{1}{index}==',')]));
% Then you can use textscan with this variable expression input similarly
% to before.
dblValues = cell2mat(textscan(fid,expr))

Hope this helps.

-Allen

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Iroquois Pliskin el 1 de Ag. de 2021

Thank you so much for your response Allen! It helps. Appreciate it!

Iniciar sesión para comentar.

Reading csv files ignoring header info and text in between

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Reading csv files ignoring header info and text in between

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos