Extract regexp tokens with regexpPattern

2 visualizaciones (últimos 30 días)
Jan Kappen
Jan Kappen el 29 de Feb. de 2024
Comentada: Jan Kappen el 29 de Feb. de 2024
With regexp I could extract the tokens of my capture groups via
regexp("abcd3e", "\w+(\d)+\w", "tokens")
ans = 1×1 cell array
{["3"]}
The result is a cell array. With the new regexpPattern and extract functions, the return values usually are string (arrays) which is something I prefer.
Question: Is there an analogon of the above regexp using something like extract("abcd3e", regexpPattern("\w+(\d)+\w"), "tokens")? This syntax obviously does not work in R2023b, but are there standard ways to rewrite these patterns to return my tokens?
Thanks,
Jan
EDIT: this is just a toy example, I do not only want to extract digits which could be done with digitsPattern. Ideally, I'd like to understand how directly translate the regexps.
To show a more realistic example:
str = [
"42652Z_HEX"
"42652X"
"42652Y"
"42652Z"
"42652GYRO-X_HEX"
"42652GYRO-Y_HEX"
"42652GYRO-Z_HEX"
"42351Temp_HEX"
"42652Temp_HEX"
"42652GYRO-X"
"42652GYRO-Y"
"42652GYRO-Z"
"42351Temp"
"42652Temp"
];
res = string(regexp(str, "\d+(?:GYRO-)?([XYZ])?.*", "tokens"))
res = 14×1 string array
"Z" "X" "Y" "Z" "X" "Y" "Z" "" "" "X" "Y" "Z" "" ""
% how to get the same result with matches and regexpPattern?
  2 comentarios
Dyuman Joshi
Dyuman Joshi el 29 de Feb. de 2024
Movida: Dyuman Joshi el 29 de Feb. de 2024
If you just want to extract numbers between letters -
str = "abcd3e57xyz";
out = extract(str, digitsPattern)
out = 2×1 string array
"3" "57"
Jan Kappen
Jan Kappen el 29 de Feb. de 2024
Movida: Dyuman Joshi el 29 de Feb. de 2024
Thanks for your answer.
No, I do not only want to extract numbers, it's a toy example. I'd like to translate the regexps which already exist into the new regexpPattern - if possible. The regexp might get more complicated than the shown one. I'll edit my question accordingly.

Iniciar sesión para comentar.

Respuestas (1)

the cyclist
the cyclist el 29 de Feb. de 2024
I realize that this is not really an answer to your question, but I just wanted to make sure you are aware that one option is to wrap the string function around the regexp:
string(regexp("abcd3e fghi4j", "\w+(\d)+\w", "tokens"))
ans = 1×2 string array
"3" "4"
Also, if you are guaranteed to have only one match, you could do
regexp("abcd3e", "\w+(\d)+\w", "tokens","once")
ans = "3"
but that's somewhat fragile coding, I would say.
I'm not yet sure if there is a more "direct" way with more recent functions.
  2 comentarios
the cyclist
the cyclist el 29 de Feb. de 2024
Your updated question clarifies that my answer is not what you are looking for, but I'll leave it here anyway. :-)
Jan Kappen
Jan Kappen el 29 de Feb. de 2024
Thank you very much for your answer.
Yes, I updated the question to clarify a bit, sorry.
There were cases in the past where I could not cast to string, I'll need to check why. In fact that's not a terrible solution, but I'm simply wondering how to use the new regexpPattern properly and maybe I'm missing something.

Iniciar sesión para comentar.

Categorías

Más información sobre Programming en Help Center y File Exchange.

Productos


Versión

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by