How, if possible, do I limit the number of times REGEXP searches for a specific pattern?
Mostrar comentarios más antiguos
I’m using a regular expression to search blocks of text that look like the following;
MSN_BER (0:31) Observation #1 Rx'd at: (58570.000) Msg. Time: (58568.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode
State Time: 12:00:00.000 (58571.000)
State Position: -1500.0000, -5000.0000, 4100.0000
MSN_RAM (0:32) Observation #20 Rx'd at: (58569.000) Msg. Time: (58569.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode
Fmt: 10 (AIRBORN__ARRAY_LOT) Length: 5678 Remote Num: 1 Number of Obsevations: 1
Type: 1 Track ID: 12345 Time Tag: 58573.00000000
Band ID: 1 AD ID: 21 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
MSN_RAM (0:32) Observation #30 Rx'd at: (58569.000) Msg. Time: (58569.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode
Fmt: 10 (AIRBORN__ARRAY_LOT) Length: 5678 Remote Num: 1 Number of Obsevations: 2
Type: 1 Track ID: 12345 Time Tag: 58583.00000000
Band ID: 1 AD ID: 31 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Type: 1 Track ID: 12345 Time Tag: 58585.00000000
Band ID: 1 AD ID: 32 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Note: There is no 2nd MSN_BER data block.
I’m using the following search pattern and REGEXP function to extract the time tag and AD ID values:
exp = '([\d\.]+)\s+Band[^A]+?AD ID:\s+(\d+).';
tokens3 = regexp(bufferSplit{BlockId}, exp, 'tokens');
This results in: tokens3 = {1x2 cell} {1x2 cell} {1x2 cell},
where the time tag and AD ID are contained in the cells for each occurrence in the block of text.
>> tokens3{1,1}
ans = '58573.00000000' '21'
>> tokens3{1,2}
ans = '58583.00000000' '31'
>> tokens3{1,3}
ans = '58585.00000000' '32'
What I’m attempting to accomplish is limit the search pattern. Specifically, limit the number of times to search for the time tag and AD ID values based on the fact that there is no 2nd MSN_BER data block. I know the command option 'once' will return only the first match found. However, there could be multiple occurrences of the AD ID and its associated time tag.
The result of this would be: tokens3 = {1x2 cell}
>> tokens3{1,1}
ans = '58573.00000000' '21'
Can this be accomplished using the REGEXP function?
3 comentarios
Cedric
el 14 de Nov. de 2013
I can still not really figure out what you are trying to achieve based on the statement and comments under Walter's answer. Could you describe a little better how/why you want to limit the search? Is the limit something that you set a priori, e.g.
nMax = 4 ;
pattern = sprintf( 'whatever{1,%d}', nMax ) ;
or is it something that should be based on the content of the file, e.g. matching all occurrences before a certain pattern is found?
Brad
el 15 de Nov. de 2013
Cedric
el 16 de Nov. de 2013
So you have a situation like the following?
MSN_BER
...
MSN_RAM
...
Type: - this block of data could occur between 1 and several hundred times
MSN_RAM ** No MSN_BER, so Type entries should be discarded.
...
Type: - this block of data could occur between 1 and several hundred times
MSN_BER
...
MSN_RAM
...
Type: - this block of data could occur between 1 and several hundred times
If, so, what do you want to achieve? Is it to get a stat on time of all types which belong to any MSN_BER, or is it a stat per MSN_BER, or anything else?
Respuesta aceptada
Más respuestas (1)
Walter Roberson
el 12 de Nov. de 2013
After a pattern, perhaps enclosed in () or (?:), you can put {minimum,maximum} counts. For example
'(?:\d\w){3,7}'
would match 3, 4, 5, 6, or 7 occurrences of \d\w repeated.
7 comentarios
Brad
el 13 de Nov. de 2013
Walter Roberson
el 13 de Nov. de 2013
You can sprintf() up a regular expression.
You could also consider using a variable number but a look-ahead expression, if there is a particular marker you can recognize.
Brad
el 14 de Nov. de 2013
Walter Roberson
el 14 de Nov. de 2013
Yes. This can be useful especially in conjunction with a lazy match instead of a greedy match. For example,
'((?:\w+=).*?)(?:MSN_RAM)'
If .* had been used then because . includes all the characters in MSN_RAM, the .* would consume MSN_RAM instead of stopping at it. If the lazy match .*? had been used by itself, it would only find the first match and none of the others. But combine the lazy match .?* with the look-forward expression (?:MSN_RAM) and the lazy .* is forced to keep matching until it finds that MSN_RAM is the next thing in the string being searched.
(Not the greatest of examples but you get the idea, I hope.)
Brad
el 14 de Nov. de 2013
Walter Roberson
el 14 de Nov. de 2013
Sorry, the look-ahead should be ?= rather than ?:
'((?:\w+=).*?)(?=MSN_RAM)'
The \w+= was just a sample pattern I tossed in for illustration; it matches a "word" followed by an equals sign.
The structure would be
(pattern_to_repeat)?*(?=pattern_to_stop_before)
Brad
el 15 de Nov. de 2013
Categorías
Más información sobre JSON Format en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!