How to effectively use look ahead with regexp?

Question

pietro el 26 de Jun. de 2017

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp

Editada: Stephen23 el 27 de Jun. de 2017

Hi all,

I'm doing some coding with regular expressions, but there are a couple of things I can't understand. Look at the following

1. searching the letter 'r' followed by a number:

regexp('19f/4r power shift','(?<=\d*) ?r')
ans = 
  6    12
regexp('19f/4r power shift','(?<=\d)\s?r')
ans = 
    6

Why the '*' change so much the result? The 'r' at the 12th position is not followed by any number.

2- Searching for the word 'Reverser' that is not preceded by the words 'power' or 'powr'.

regexp('power  Reverser','(?<!powe?r) *-? *Reverser','match')
ans = 
    ' Reverser'

Reverser is preceded by the string 'power', so it shouldn't be selected.

Why do these occur?

Thanks

Best regards,

Pietro

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Stephen23 el 26 de Jun. de 2017

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp#answer_271972

Editada: Stephen23 el 26 de Jun. de 2017

Abrir en MATLAB Online

1. "searching the letter 'r' followed by a number." Actually you seem to be wanting to search for the letter 'r' preceded by a number, not "followed by". Only the second of your regexps does this. By adding the * to the first regexp you make the digits optional (the asterisk matches zero or more times!) So clearly the second r in that short string matches your first regular expression: it constitutes an 'r' preceded by zero spaces (permitted by the ?) and by zero digits (permitted by the *).

You could use + (match one or more) rather than * (match zero or more):

regexp('19f/4r power shift','(?<=\d+)\s?r')

but this is not really necessary: matching one digit is enough because if there are multiple digits then there is also one digit.

2. This is a much more subtle problem. The basic problem here is the optimism of regular expressions, and that * on the space character. What happens is that the regular expression parser keeps on trying new combinations to match as much of the string as possible, which clearly differs from how you perceive its operation (you want it to quit after matching that lookaround once).

The regular expression will correctly match 'power', but then it notices that you placed an asterisk * on the space. When it tries, for example, one space character preceding that word then your lookaround is satisfied: if it matches one space with the optional spaces ' *' regex, then the look around is also satisfied because what precedes that one space? Another space character! Therefore the lookaround is happy (one space is not equal to 'power'), and the regular expression parser is happy because it wants to match as much of the string as possible. Therefore it picks this option.

Basically what you seem to want is a pessimistic parser (you want to return no match if any one combination is a match to that lookaround, even if others do not match the lookaround), but in reality regexp parsers are optimistic: they return a match if any one combination is a match. They reject the one case that you are interested in because other cases better fulfill their basic operational principal: match as much as possible, however it can.

To see what parts of the strings are matched you should look at using a dynamic regular expression, e.g. adding:

(?@disp($1))

into your regexp and seeing how the string is parsed.

Do you really need to match an unknown number of space characters?

2 comentarios
Mostrar NingunoOcultar Ninguno

pietro el 26 de Jun. de 2017

I got it!!! thanks a lot

Stephen23 el 27 de Jun. de 2017

Editada: Stephen23 el 27 de Jun. de 2017

Abrir en MATLAB Online

You could move the space inside the lookaround:

>> regexp('power  Reverser','(?<!powe?r *)Reverser','match')
ans = 
     {}
>> regexp('power X Reverser','(?<!powe?r *)Reverser','match')
ans = 
    'Reverser'

Iniciar sesión para comentar.

How to effectively use look ahead with regexp?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

How to effectively use look ahead with regexp?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno