MATLAB Answers

Regular expression. Are nesting of group operators supported?

14 views (last 30 days)
per isakson
per isakson on 17 Dec 2018
Edited: per isakson on 14 Oct 2019
Regarding Grouping Operators the function, regexp, doesn't behave the way I expected.
>> cac = regexp( 'ABC', '((A)(B(C)))', 'tokens' );
>> cac{1}(:)
ans =
1×1 cell array
{'ABC'}
regexp returns one token without any protests regarding my parentheses. I expected four: 'ABC', 'A', 'BC' and 'C'. The reason I expected that is because most other flavors of regular expressions would have returned four tokens. Java: Capturing Groups would
In the expression ((A)(B(C))), for example, there are four such groups:
  1. ((A)(B(C)))
  2. (A)
  3. (B(C))
  4. (C)
Another couple of examples
>> cac = regexp( 'ABC', '(A)(B(C))', 'tokens' );
>> cac{1}(:)
ans =
2×1 cell array
{'A' }
{'BC'}
>> cac = regexprep( 'ABC', '((A)(B(C)))', ' --- $1 ---' )
cac =
' --- ABC ---'
>> cac = regexprep( 'ABC', '((A)(B(C)))', ' --- $2 ---' )
cac =
' --- $2 ---'
The documentation on Grouping Operators is terse and there are only few examples. I've found nothing on "groups inside groups".
Question:
Are nesting of group operators supported or am I a victim of wishful thinking?

  0 Comments

Sign in to comment.

Accepted Answer

Sean de Wolski
Sean de Wolski on 17 Dec 2018
Edited: Sean de Wolski on 17 Dec 2018
The Note below "Named Token Operator" indicates that the outermost will be captured, hence ABC and one token.
Note
If an expression has nested parentheses, MATLAB® captures tokens that correspond to the outermost set of parentheses. For example, given the search pattern '(and(y|rew))', MATLAB creates a token for 'andrew' but not for 'y' or 'rew'.
With string arrays, I'd recommend just creating an array of acceptable tokens:
cac = regexp("ABC", ["(ABC)","(A)", "(BC)", "(C)"], 'tokens' );
cac{:}
ans =
1×1 cell array
{["ABC"]}
ans =
1×1 cell array
{["A"]}
ans =
1×1 cell array
{["BC"]}
ans =
1×1 cell array
{["C"]}

  0 Comments

Sign in to comment.

More Answers (0)


Translated by