How to extract numeric data between string lines?

2 visualizaciones (últimos 30 días)
Federico Geser
Federico Geser el 27 de En. de 2021
Editada: Stephen23 el 27 de En. de 2021
Hi MATLAB Community
I'm trying to solve this problem, which for sure is not new, but I haven't been able to find a proper solution.
I have a file with several headlines, and then a lot of information in the following way:
Binning n: 1, "De19 ", Event #: 150, Primary(s) weight 1.0000E+00
Number of hit cells: 0
Binning n: 1, "De19 ", Event #: 151, Primary(s) weight 1.0000E+00
Number of hit cells: 1
1 7.185244612628594E-05
Binning n: 1, "De19 ", Event #: 152, Primary(s) weight 1.0000E+00
Number of hit cells: 0
Binning n: 1, "De19 ", Event #: 153, Primary(s) weight 1.0000E+00
Number of hit cells: 0
As shown, sometimes after the "Number of hit cells" line, there are numbers. I would like to extract them in a matrix or array. Is there a way to do this?
I attached an example file, that usually contains a lot more of data, that I erased for weight questions.
Thank you very much in advance

Respuesta aceptada

Stephen23
Stephen23 el 27 de En. de 2021
Editada: Stephen23 el 27 de En. de 2021
str = fileread('02-2021-Clearance-Box005_fort72.txt');
rgx = '(?<=Number of hit cells:\s+\d+\s+)(\d+[^\n]*)';
tmp = regexp(str,rgx,'match')
tmp = 1x2 cell array
{'1 7.185244612628594E-05'} {'1 2.547905314713717E-04'}
vec = cellfun(@(s)sscanf(s,'%f',[1,Inf]),tmp,'uni',0) % convert to numeric
vec = 1x2 cell array
{1×2 double} {1×2 double}
mat = vertcat(vec{:}) % optional merge into one numeric matrix
mat = 2×2
1 7.1852e-05 1 0.00025479
  4 comentarios
Federico Geser
Federico Geser el 27 de En. de 2021
Hi Stephen!
I think it works, but the test file has 12 MB of info to filter, so it might take a while. I don't know if this will work when I get the real results (that may weight ca. 100 MB).
Nevertheless, very helpful solution! Thank you!
Stephen23
Stephen23 el 27 de En. de 2021
Editada: Stephen23 el 27 de En. de 2021
If there are always exactly two numbers on each of those lines, then this is probably more efficient:
str = fileread('02-2021-Clearance-Box005_fort72.txt');
rgx = '(?<=Number of hit cells:\s+\d+\s+)(\d+[^\n]*)'; % unchanged
tmp = regexp(str,rgx,'match'); % unchanged
mat = sscanf(sprintf(' %s',tmp{:}),'%f',[2,Inf]).'
mat = 2×2
1 7.1852e-05 1 0.00025479

Iniciar sesión para comentar.

Más respuestas (0)

Etiquetas

Productos


Versión

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by