Extracting data from strings with varying delimiters and column widths

Question

Alastair Temple el 27 de Feb. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/507873-extracting-data-from-strings-with-varying-delimiters-and-column-widths

Comentada: Alastair Temple el 27 de Feb. de 2020

I have read in the data as a single column array of strings and stripped out all the extra information and have each set of node temperatures seperated by a "--------------------" row.

My main trouble is dealing with the occurance that can be seen in the final line of this timestep where we have temperatures of 1000 or higher and the space disapears between the temeprature value and the node number and the node and temperature columns are different widths so can't be split by a single width.

Any help on solving this would be great

CONVERGENCE HAS BEEN OBTAINED.
 =============================
    TIME =  1646.00000
 TOTAL TEMPERATURES.
 --------------------
 NODE TEMP.  NODE TEMP.  NODE TEMP.  NODE TEMP.  NODE TEMP.
    1  98.7    2  98.7    3  91.6    4  91.6    5  85.1
    6  85.1    7  79.2    8  79.2    9  73.8   10  73.8
   11  68.8   12  68.8   13  64.3   14  64.3   15  60.0
...
  346  20.0  347  20.0  348  20.0  349  20.0  350  20.0
  351 110.2  352 110.2  353 312.0  354 312.0  355 582.7
  356 582.7  357 753.4  358 753.4  359 854.4  360 854.4
  361 932.8  362 932.8  363 999.5  364 999.5  3651050.0
  3661050.0  3671087.7  3681087.7  3691117.9  3701117.9

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Alastair Temple el 27 de Feb. de 2020

Can I use readtable on data already in matlab (as I need to strip out lot of extra info first)? Or would I need to read in the origional output file, strip out the extra data... write it back to a text file or csv and then re-read it using readtable?

Stephen23 el 27 de Feb. de 2020

Editada: Stephen23 el 27 de Feb. de 2020

As far as I can tell readtable was not designed to parse strings/character vectors.

This quite an oversight, as this was a very very useful feature of textscan.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Stephen23 el 27 de Feb. de 2020

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/507873-extracting-data-from-strings-with-varying-delimiters-and-column-widths#answer_417594

Editada: Stephen23 el 27 de Feb. de 2020

You can insert delimiters between those numbers, e.g. using regexprep:

str = regexprep(str,' *(\d+?) *(\d{1,4}\.\d+)','$1,$2,')

Note that regexprep can be applied to a string array or to a cell array of character vectors (no loop required).

Then use textscan or sscanf or whatever you prefer to convert to numeric, e.g. for each line of text:

sscanf(str,'%f,',[1,Inf])

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Stephen23 el 27 de Feb. de 2020

Editada: Stephen23 el 27 de Feb. de 2020

Almost.

In this case the ? actually modifies the preceding quantifier to be lazy (by default quantifiers are greedy, i.e. match as much as they can).

' *(\d+?) *(\d{1,4}\.\d+)'
% *                         zero or more spaces
%  (                        start token 1 (for Node #)
%   \d+                     match one or more digits, but...
%      ?                    lazy match, i.e. match as *few* characters as possible
%       )                   end token 1
%         *                 zero or more spaces
%          (                start token 2 (for Temp #)
%           \d{1,4}         match from one to four digits (default = greedy)
%                  \.       match period character
%                    \d+    match one or more digits
%                       )   end token 2

Note that the regular expression is anchored on the period character. Token 2 will (greedy) match from 1 to 4 digits before the period, but no more than that, so even if the space is missing, this fixes the length of token 2 to a maximum of four digits. Because token 1 uses a lazy quantifier, it collects however many digits are remaining before token 2. Read more here:

https://www.mathworks.com/help/matlab/matlab_prog/regular-expressions.html#f0-43073

Alastair Temple el 27 de Feb. de 2020

Awesome thank you again (also very nice and neat explaination there).

Iniciar sesión para comentar.

Extracting data from strings with varying delimiters and column widths

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Extracting data from strings with varying delimiters and column widths

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo