Regexp lookbehind and lineanchors

14 visualizaciones (últimos 30 días)
alelap
alelap el 12 de Sept. de 2019
Editada: alelap el 16 de Sept. de 2019
Could someone help me to understand why
st = ' a b c';
pattern = '(?<=^\s*)c';
regexp(st,pattern,'lineanchors')
ans =
[]
i.e., does not match (as I expected), while
st2 = [newline,st];
regexp(st2,pattern,'lineanchors')
ans =
7
i.e., finds a match?
My intent is to match 'c' that is preceded by the beginning of a line and zero or more white character. How should I do?
  2 comentarios
Stephen23
Stephen23 el 12 de Sept. de 2019
Editada: Stephen23 el 13 de Sept. de 2019
Getting an output of 7 seems like a bug to me. Strangely the bug occurs even if the "zero or more matches" character does not even exist in the input string (R2012b):
>> regexp([char(10),st],'(?<=^_*)c','lineanchors') % Underscore is not in st.
ans =
7
>> regexp([char(10),st],'(?<=^)c','lineanchors') % expected
ans =
[]
>> regexp(st,'(?<=^_*)c','lineanchors') % expected
ans =
[]
What MATLAB version are you using?
You should report this as a bug, giving a link to this thread.
alelap
alelap el 12 de Sept. de 2019
Editada: alelap el 16 de Sept. de 2019
R2019a. Reported to Technical Support.
Edit: bug confirmed. Excerpt from Matlab Support's answer:
Indeed the behavior that you observed is indeed a bug in "regexp", which the developers are now aware of, and which might be addressed in some future release.
However, a workaround does exists, which consists in giving up on using the 'lineanchors' option (which makes the "^" and "$" metacharacters match embedded newlines too), and rely on grouping the (absolute) beginning of line "^" and the embedded newline "\n" as two alternatives.

Iniciar sesión para comentar.

Respuesta aceptada

per isakson
per isakson el 13 de Sept. de 2019
Editada: per isakson el 16 de Sept. de 2019
"My intent is to match 'c' that is preceded by the beginning of a line and zero or more white character."
In the character array, ' a b c', the character, 'c', is (after the beginning of the line) preceded not only by whitespace but also by the characters 'a' and 'b'. Thus, [] is the expected result. Try
%%
chr = ' a b c';
xpr = '(?<=^[ ab]*)c';
regexp( chr, xpr, 'match', 'lineanchors' )
that returns
ans =
1×1 cell array
{'c'}
I fail to understand the behavior of your second example. I expect [], not 7. It's looks like a bug to me.
/R2018b
ADDENDUM
I learned something about the option,'once', the other day. It affects the type of the output. In this case the output is a character row instead of a cell array containing the character row. Thus,
>> regexp( chr, xpr, 'match', 'lineanchors', 'once' )
ans =
'c'
  2 comentarios
alelap
alelap el 13 de Sept. de 2019
Thank you. It works. Unfortunately, my example is a simplification of my real scenario where a b c are more complicated expressions and I cannot use this method.
per isakson
per isakson el 13 de Sept. de 2019
"Could someone help me to understand why" I think I did that.
I cannot help regarding the "real scenario" because of lack of information.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Desktop en Help Center y File Exchange.

Productos


Versión

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by