Extracting string and number pairs from a mixed string
10 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
LukasJ
el 13 de Feb. de 2020
Dear all,
I have read in material data and have gotten an array of chemical compositions of the type (chr):
'Fe61Zr8Co7Mo15B7Y2' sometimes even '(Fe0.5Co0.5)58.5Cr14Mo6C15B6Er0.5' wherein Fe and Co have a percentage of 29.25 (58.5*0.5...).
How do I extract the chemical composition so that I receive an array (or two) like
['Fe', 'Zr', 'Co', 'Mo', 'B', 'Y'; 61, 8, 7, 15, 7, 2]?
I am somehow failing to use sscanf correctly, e.g. numbers = sscanf('Fe61Zr8Co7Mo15B7Y2', '%f') won't do anything since I need to omit varying strings :(
Thanks a lot in advance!
Greetings
Lukas
0 comentarios
Respuesta aceptada
Stephen23
el 13 de Feb. de 2020
Editada: Stephen23
el 15 de Jun. de 2020
The two lines marked with %%% are used to convert substrings like '(Fe0.5Co0.5)58.5' into 'Fe29.25Co29.25', after which the designator+number matching and extraction is easy:
>> str = '(Fe0.5Co0.5)58.5Cr14Mo6C15B6Er0.5';
>> baz = @(s,b)regexprep(s,'(\d+\.?\d*)','${num2str(str2double($1)*str2double(b))}'); %%%
>> tmp = regexprep(str,'\((([A-Z][a-z]*\d+\.?\d*)+)\)(\d+\.?\d*)','${baz($1,$2)}'); %%%
>> out = regexp(tmp,'([A-Z][a-z]*)(\d+\.?\d*)','tokens');
>> out = vertcat(out{:})
out =
'Fe' '29.25'
'Co' '29.25'
'Cr' '14'
'Mo' '6'
'C' '15'
'B' '6'
'Er' '0.5'
>> str2double(out(:,2)) % optional
ans =
29.25
29.25
14
6
15
6
0.5
The code uses two layers of dynamic regular expression, the first calls baz for each each '(AxBy...)N' substring, then inside baz each of x, y, etc. is multiplied with N. The result is converted to char for reinsertiion.
6 comentarios
Stephen23
el 15 de Jun. de 2020
Editada: Stephen23
el 15 de Jun. de 2020
The reason is that I missed a question mark here:
tmp = regexprep(str,'\((([A-Z][a-z]*\d+\.?\d*)+)\)(\d+\.?\d*)','${baz($1,$2)}')
% ^ missing
which meant that the regular expression did not match integer numbers, only numbers with a decimal point.
With that question mark in place (I have now corrected the question and comments), this is the output:
>> out = regexp(tmp,'([A-Z][a-z]*)(\d+\.?\d*)','tokens');
>> out = vertcat(out{:})
out =
'Fe' '6550.4'
'B' '2208'
'Y' '441.6'
'Nb' '8'
"Another issue are compositions which contain more "subcompositions" and are separated by other brackets."
Regular expressions alone are not really suitable for this. For parsing arbitrarily nested brackets like that you will probably have to write your own string parser, e.g. based on a recursive function which uses regular expressions or other string manipulation inside it. Have you worked with recursive functions before?
Чебура́шка, да!
Más respuestas (1)
Ver también
Categorías
Más información sobre Characters and Strings en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!