Regular Expression to detect spaces in a string

Question

Deepak el 14 de Oct. de 2013

1
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/90169-regular-expression-to-detect-spaces-in-a-string

Comentada: Cedric el 14 de Oct. de 2013

Hallo All, I have a string for example

string='<abcd/abcd/yxz/xyz/MOTOR50DEV/sdsds/Limit not yet decided  abcd/abcd/yxz/xyz/MOTOR50DEV/sdsds/Limit not yet decided>'

I want to use regexp to get all the white spaces that occur between " and < /a >. I have been trying to figure out how to use regexp to get the spaces but have not yet found an elegant solution. For eg: regexp(string,'(?<="\S*)\s') retuns only 2 spaces and not all of them.. Could someone help me out..

Thanks a lot

2 comentarios
Mostrar NingunoOcultar Ninguno

Cedric el 14 de Oct. de 2013

Editada: Cedric el 14 de Oct. de 2013

What do you mean by "spaces"? Is it just white spaces or all characters? If you really meant white spaces, is it their position that you want? If you want characters, what is the purpose? REGEXP can parse the whole tag and extract whatever part you want.

Jan el 14 de Oct. de 2013

There are two " characters in this string. Which one do you mean? Please post the wanted result by editing the question (not as comment or pseudo-answer).

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Deepak el 14 de Oct. de 2013

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/90169-regular-expression-to-detect-spaces-in-a-string#answer_99684

Hi Cedric, Thanks for the really detailed answer. It really helped. I actually wanted to get the position of white spaces. So the second part of the answer really addresses my query. I was hoping to get the whote spaces with one regexp without using any other commands like isspace, but I guess would be complicated... I am not really familiar with tokens.. So once again thanks for ur detailed answer..

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Cedric el 14 de Oct. de 2013

Editada: Cedric el 14 de Oct. de 2013

Hi Deepak, The issue with counting spaces using regexp is that it's not possible to do it using a simple query. The call to regexp (possibly regexprep) that we would have to use would be much more complicated than doing the whole operation using one call to regexp with a simple pattern and a few additional operations.

Iniciar sesión para comentar.

Answer 2

Cedric el 14 de Oct. de 2013

3
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/90169-regular-expression-to-detect-spaces-in-a-string#answer_99630

Editada: Cedric el 14 de Oct. de 2013

Abrir en MATLAB Online

Here is an example assuming that you want characters between " and </a> and not only white spaces:

 >> s = regexp( html, '(?<=")[^"]+(?=</a>)', 'match' )
 s = 
    '>Mathworks'    '>Google'

Look-arounds are treacherous when dealing with this type of situations where the expression in the look-behind can appear multiple times before the expression in the look forward is found. The following example illustrates it

 >> s = regexp( html, '(?<=").+?(?=</a>)', 'match' )
 s = 
    'http://www.mathworks.com">Mathworks'    'http://www.google.com">Google'

where we see that the "smallest possible match" fails despite the lazy .+?. Let me know if you want to understand why.. or see the example/discussion between Per and I here.

Note that using tokens is generally more efficient than using look-arounds:

 >> s = regexp( html, '"([^"]+)</a>', 'tokens' ) ;
 >> celldisp(s)
 s{1}{1} =
         >Mathworks
 s{2}{1} =
         >Google

Back to the initial question, the pattern could be more specific though if you wanted to extract the content or the value of the href parameter, e.g.

 >> s = regexp( html, '[^>]+(?=</a>)', 'match' )
 s = 
    'Mathworks'    'Google'

Or

 >> s = regexp( html, 'href.+?"([^"]*)', 'tokens' ) ;
 >> celldisp(s)
   s{1}{1} =
           http://www.mathworks.com
   s{2}{1} =
           http://www.google.com

Or

 >> s = regexp( html, 'href.+?"(?<href>[^"]*).*?>(?<content>.*?)</a>', 'names' )
 s = 
    1x2 struct array with fields:
      href
      content
 >> s(1)
 ans = 
       href: 'http://www.mathworks.com'
    content: 'Mathworks'
 >> s(2)
 ans = 
       href: 'http://www.google.com'
    content: 'Google'

All these approaches can be fine-tuned/complex-ified for managing a broader set of cases, e.g. when there is a tag in the content of the anchor tag.

EDIT: if you really want to get the position of white spaces, your expression does work but not as you thought. It actually matches

'"abcd/abcd/yxz/xyz/MOTOR50DEV/sdsds/Limit '

and

'">abcd/abcd/yxz/xyz/MOTOR50DEV/sdsds/Limit '

which start both with a " followed by non-whitespace characters until after the t of Limit. Once thing that you could do if you wanted to keep the pattern simple, is to get the starting and ending position of the relevant sub-string:

 >> [mStart, mStop] = regexp( html, '(?<=")[^"]+(?=</a>)', 'start', 'end' )
 mStart =
    76
 mStop =
   132

and use them to mask a logical index of position of white spaces:

 >> isSpace = html == ' ' ;
 >> isSpace(1:mStart-1)  = false ;
 >> isSpace(mStop+1:end) = false ;
 >> find( isSpace )
 ans =
   117   121   125

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Regular Expression to detect spaces in a string

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

Regular Expression to detect spaces in a string

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos