How can I arrange my output from regexp stored in multiple cells in a for loop?
Mostrar comentarios más antiguos
Hi, I am using regexp to extract and match data from a textstring with the code:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens'); %find RVR in DATALow
SortTokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %sort RVR as vertical cells
My output is stored in cells within a cell like this:
[]
[]
[]
<4x2 cell>
<4x2 cell>
<4x2 cell>
[]
[]
The output cells contain the data that look like this:
'R01L' '1500'
'R19R' '1500'
'R01R' '1300'
'R19L' '1500'
But the output cells are of different shape and can look like this as well:
[]
<1x2 cell>
<1x2 cell>
[]
My goal is to extract the data with a for-loop that take the size of the output cell into consideration and store it in to a cell with this code:
NoRUNWAY=ones(1,length(SortTokens)); %vector of zeros for speed
for j=1:length(SortTokens) %for all data in the cell
NoRws=length(SortTokens{j,1}); %count the length of each row
if NoRws>0 %if larger than zero
NoRUNWAY(j)=NoRws; %set the number to the length of the row
end
end
isemp = cellfun('isempty', tokens); %find all empty cells in tokens
for l=1:length(SortTokens);
RWYnum=NoRUNWAY(l);
for k=1:RWYnum
tempRUNWAY = cellfun(@(x) x{k,1}, SortTokens(~isemp), 'uni', 0);
tempRVR = cellfun(@(x) x{k,2}, SortTokens(~isemp), 'uni', 0);
RVR = nan(size(SortTokens));
RVR(~isemp) = cellfun(@str2num, tempRVR);
RVRnan=isnan(RVR);
RVRnanx=find(RVRnan);
RVR(RVRnanx)=9999;
RWYcell{1,k}=tempRUNWAY(1);
RVRcell{1,k}=RVR;
end
end
The largest output cell is of size
<4x2 cell>
I would like to store the data into a new cell with four columns and to ultimately compare these values with some other measurements.
Is this making any sense? These are measurements of Runway Visual Range at multiple runways from different Airports and I would like to compare these with the Meteorological Visibility for the same Airports. The Data I am using called DATALow looks like this:
'METAR ESNS 010050Z AUTO 00000KT 0500 R10/0550V1300N R28/0500V0750N FG VV000 09/08 Q1011'
'METAR ESNS 010150Z AUTO 30002KT 0150 R10/0200N R28/0500VP1500N FG VV001 10/09 Q1012'
'METAR ESNS 010220Z AUTO 00000KT 0300 R10/0450V0800N R28/0300V0650D FG VV000 09/09 Q1012'
'METAR ESNS 010250Z AUTO 00000KT 0050 R10/0550V0800N R28/0175N FG VV000 10/09 Q1012'
'METAR ESNS 010320Z AUTO 00000KT 0050 R10/0200N R28/0375N FG VV001 10/09 Q1012'
'METAR ESNS 010350Z AUTO 00000KT 0100 R10/0250N R28/0250N FG VV001 10/10 Q1012'
'METAR ESNS 010420Z AUTO VRB02KT 0150 R10/0300N R28/0275N FG VV001 11/11 Q1012'
'METAR ESNS 010450Z AUTO 00000KT 0250 R10/0600VP1500N R28/0500V0800N FG VV001 12/11 Q1012'
And I just realized that my regexp code is missing most of the RVR because it is looking to match Runway designators with the shape:
R19L/
which is not the case for most of the Airports. Can someone please help with this?
15 comentarios
George
el 18 de Oct. de 2016
What are you trying to extract from the DATALow text? I'm not clear on what data you're interested in.
Andrei Bobrov
el 18 de Oct. de 2016
Linus Dock
el 18 de Oct. de 2016
Guillaume
el 18 de Oct. de 2016
Well, modifying the regex to take into account the optional letter at the end of the 'R' group is not a problem:
regexp(DATALow, '\<(R\d{2}[A-Z]?)/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens');
just one extra ? after the letter range.
Linus Dock
el 18 de Oct. de 2016
Linus Dock
el 18 de Oct. de 2016
Guillaume
el 18 de Oct. de 2016
At this point, it's probably better to make the regex less specific, rather than more specific. The following works on that larger sample of data:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})[A-Z]?\>', 'tokens');
Linus Dock
el 19 de Oct. de 2016
Linus Dock
el 19 de Oct. de 2016
The BECMG problem is easy to fix:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[^ ]*?(\d{4})[A-Z]?\>', 'tokens');
From your previous examples and questions, I assumed that the 2nd part of the RVR is what you wanted, not the first part. The regexp is explicitly designed to discard that first part if present. It certainly can be changed to return the first part instead of the 2nd.
However, at this point it would be much better if you explained thoroughly the grammar of that bit of runway string. What is allowed and not allowed for each part, what part or combination of part are optional, and of course, which part you actually want?
Linus Dock
el 20 de Oct. de 2016
Linus Dock
el 20 de Oct. de 2016
Linus Dock
el 20 de Oct. de 2016
Linus Dock
el 20 de Oct. de 2016
Guillaume
el 20 de Oct. de 2016
You cannot create a regular expression (even a dynamic one) that would match the smaller of the two numerical groups if both are present. You would have to return both group and select the minimum afterward.
I believe the following would suit:
%the regexp now returns three tokens per match, the last token of each match may be empty
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4})[A-Z]?(\d{4})?[A-Z]?\>', 'tokens');
tokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %concatenate all pairs of each row vertically
alltokens = vertcat(tokens{:}); %concatenate it all regardless of row, note that this remove empty rows
allvalues = str2double(alltokens(:, [2 3])); %convert RVR tokens to number. If only one RVR per match, the second token is converted to NaN
minvalues = min(allvalues, [], 2);
If using an old version of matlab where min does not ignore nans by default, replace the nans by inf before the call to min:
allvalues(isnan(allvalues)) = inf;
or use nanmin if appropriate toolbox is installed.
Respuesta aceptada
Más respuestas (1)
Andrei Bobrov
el 18 de Oct. de 2016
tokens = regexp(DATALow, '\<(R\d{2})/(\d{4})[A-Z]+(?:(?:\d{4})[A-Z])?\>', 'tokens');
out = cellfun(@(x)cat(1,x{:}),tokens,'un',0);
6 comentarios
Linus Dock
el 18 de Oct. de 2016
Linus Dock
el 18 de Oct. de 2016
Andrei Bobrov
el 18 de Oct. de 2016
Editada: Andrei Bobrov
el 18 de Oct. de 2016
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/(\d{4})[A-Z]*(?:(?:\d{4})[A-Z])?\>', 'tokens');
out = cellfun(@(x)cat(1,x{:}),tokens,'un',0);
Linus Dock
el 19 de Oct. de 2016
Linus Dock
el 19 de Oct. de 2016
Andrei Bobrov
el 19 de Oct. de 2016
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4,})[A-Z]*(?:(?:\d{4})[A-Z])?\>|(?:\<BECMG\>).*(\<\d{4}\>)', 'tokens');
Categorías
Más información sobre Spreadsheets en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!