How can I arrange my output from regexp stored in multiple cells in a for loop?

Hi, I am using regexp to extract and match data from a textstring with the code:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens'); %find RVR in DATALow
SortTokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %sort RVR as vertical cells
My output is stored in cells within a cell like this:
[]
[]
[]
<4x2 cell>
<4x2 cell>
<4x2 cell>
[]
[]
The output cells contain the data that look like this:
'R01L' '1500'
'R19R' '1500'
'R01R' '1300'
'R19L' '1500'
But the output cells are of different shape and can look like this as well:
[]
<1x2 cell>
<1x2 cell>
[]
My goal is to extract the data with a for-loop that take the size of the output cell into consideration and store it in to a cell with this code:
NoRUNWAY=ones(1,length(SortTokens)); %vector of zeros for speed
for j=1:length(SortTokens) %for all data in the cell
NoRws=length(SortTokens{j,1}); %count the length of each row
if NoRws>0 %if larger than zero
NoRUNWAY(j)=NoRws; %set the number to the length of the row
end
end
isemp = cellfun('isempty', tokens); %find all empty cells in tokens
for l=1:length(SortTokens);
RWYnum=NoRUNWAY(l);
for k=1:RWYnum
tempRUNWAY = cellfun(@(x) x{k,1}, SortTokens(~isemp), 'uni', 0);
tempRVR = cellfun(@(x) x{k,2}, SortTokens(~isemp), 'uni', 0);
RVR = nan(size(SortTokens));
RVR(~isemp) = cellfun(@str2num, tempRVR);
RVRnan=isnan(RVR);
RVRnanx=find(RVRnan);
RVR(RVRnanx)=9999;
RWYcell{1,k}=tempRUNWAY(1);
RVRcell{1,k}=RVR;
end
end
The largest output cell is of size
<4x2 cell>
I would like to store the data into a new cell with four columns and to ultimately compare these values with some other measurements.
Is this making any sense? These are measurements of Runway Visual Range at multiple runways from different Airports and I would like to compare these with the Meteorological Visibility for the same Airports. The Data I am using called DATALow looks like this:
'METAR ESNS 010050Z AUTO 00000KT 0500 R10/0550V1300N R28/0500V0750N FG VV000 09/08 Q1011'
'METAR ESNS 010150Z AUTO 30002KT 0150 R10/0200N R28/0500VP1500N FG VV001 10/09 Q1012'
'METAR ESNS 010220Z AUTO 00000KT 0300 R10/0450V0800N R28/0300V0650D FG VV000 09/09 Q1012'
'METAR ESNS 010250Z AUTO 00000KT 0050 R10/0550V0800N R28/0175N FG VV000 10/09 Q1012'
'METAR ESNS 010320Z AUTO 00000KT 0050 R10/0200N R28/0375N FG VV001 10/09 Q1012'
'METAR ESNS 010350Z AUTO 00000KT 0100 R10/0250N R28/0250N FG VV001 10/10 Q1012'
'METAR ESNS 010420Z AUTO VRB02KT 0150 R10/0300N R28/0275N FG VV001 11/11 Q1012'
'METAR ESNS 010450Z AUTO 00000KT 0250 R10/0600VP1500N R28/0500V0800N FG VV001 12/11 Q1012'
And I just realized that my regexp code is missing most of the RVR because it is looking to match Runway designators with the shape:
R19L/
which is not the case for most of the Airports. Can someone please help with this?

15 comentarios

What are you trying to extract from the DATALow text? I'm not clear on what data you're interested in.
Hi George! Please look here.
I would like to extract the following from the string:
'METAR ESNS 010050Z AUTO 00000KT 0500 R10/0550V1300N R28/0500V0750N FG VV000 09/08 Q1011'
Runway designator: R10 and R28. The lowest RVR-value for corresponding runway: 0550 and 0500.
I have already extracted the Meteorological Visibility of 0500 from this string. This is sort of what I would like to do:
tokens = regexp(DATALow, '((R\d{2}[A-Z])|(R\d{2}))/(\w(\d{4})|(\d{4})?:((\d{4})?[A-Z]+)?(\d{4})[A-Z]|(\d{4}))', 'tokens');
It's a bit complex but the variation of the RVR-group makes it hard to construct a simple expression.
Well, modifying the regex to take into account the optional letter at the end of the 'R' group is not a problem:
regexp(DATALow, '\<(R\d{2}[A-Z]?)/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens');
just one extra ? after the letter range.
Thank you! I have added the extra ? in my expression to handle the 'R' Group:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/(\w(\d{4})|(\d{4})?:((\d{4})?[A-Z]+)?(\d{4})[A-Z]|(\d{4}))', 'tokens');
It almost works now, just one tiny thing left. In this string:
'METAR ESMK 060020Z AUTO 00000KT 0800 R01/P2000D R19/P2000D FEW067/// 12/12 Q1008'
How do I not include the preceding 'P'?
I get an extra 'P' in front of the '2000' in my output:
'R01' 'P2000'
'R19' 'P2000'
I might as well give you more of the DATALow just in case:
'METAR ESNS 010020Z AUTO VRB01KT 0050 R10/0375V0550N R28/0150V0325N FG VV000 09/08 Q1011'
'METAR ESNS 010050Z AUTO 00000KT 0500 R10/0550V1300N R28/0500V0750N FG VV000 09/08 Q1011'
'METAR ESNS 010150Z AUTO 30002KT 0150 R10/0200N R28/0500VP1500N FG VV001 10/09 Q1012'
'METAR ESNS 010220Z AUTO 00000KT 0300 R10/0450V0800N R28/0300V0650D FG VV000 09/09 Q1012'
'METAR ESNS 010250Z AUTO 00000KT 0050 R10/0550V0800N R28/0175N FG VV000 10/09 Q1012'
'METAR ESNS 010320Z AUTO 00000KT 0050 R10/0200N R28/0375N FG VV001 10/09 Q1012'
'METAR ESNS 010350Z AUTO 00000KT 0100 R10/0250N R28/0250N FG VV001 10/10 Q1012'
'METAR ESNS 010420Z AUTO VRB02KT 0150 R10/0300N R28/0275N FG VV001 11/11 Q1012'
'METAR ESNS 010450Z AUTO 00000KT 0250 R10/0600VP1500N R28/0500V0800N FG VV001 12/11 Q1012'
'METAR ESNY 010120Z AUTO 29004KT 0150 R30/0500N FG VV001 12/12 Q1010'
'METAR ESNY 010150Z AUTO 28003KT 0200 R30/0450 FG VV001 12/12 Q1010'
'METAR ESNY 010220Z AUTO 31001KT 0200 R30/0450N FG VV001 12/12 Q1011'
'METAR ESNY 010250Z AUTO 26002KT 0150 R30/0350N FG VV001 12/12 Q1011'
'METAR ESNY 010320Z AUTO 28004KT 0350 R30/0700N FG VV001 12/12 Q1011'
'METAR ESNY 010350Z AUTO 29004KT 0500 R30/0750 FG VV001 12/12 Q1011'
'METAR ESNY 010420Z AUTO 30004KT 1300 R30/1000VP1500U VV002 12/12 Q1012 REFG'
'METAR ESNY 010450Z AUTO 29006KT 1200 R30/0800VP1500D VV001 12/12 Q1012 REFG'
'METAR ESNY 010520Z AUTO 30005KT 1000 R30/0750VP1500D VV001 12/12 Q1012 REFG'
'METAR ESNY 012120Z AUTO 28001KT 1300 R30/1100VP1500 SKC 11/11 Q1018 REFG'
'METAR ESPA 010250Z 31003KT 0300 R14/0450N R32/0250N FG VV010 11/11 Q1012'
'METAR ESPA 010320Z 36002KT 0250 R14/0400V0600N R32/0375N FG VV001 12/12 Q1012'
'METAR ESPA 010350Z VRB02KT 0300 R14/0500N R32/0700N FG OVC001 12/12 Q1012'
'METAR ESPA 010420Z VRB02KT 0500 R14/0750V1500U R32/P1500N FG OVC001 13/13 Q1013'
'METAR ESMK 022250Z AUTO 07004KT 0200 R01/0350VP2000 R19/P2000D FG NCD 12/12 Q1017'
'METAR ESMK 022320Z AUTO 06004KT 0100 R01/0350 R19/P2000D NCD 12/11 Q1017'
'METAR ESMK 022350Z AUTO 06003KT 0050 R01/0300N R19/1200VP2000D FG NCD 12/11 Q1017'
'METAR ESMK 060020Z AUTO 00000KT 0800 R01/P2000D R19/P2000D FEW067/// 12/12 Q1008'
'METAR ESMK 060120Z AUTO 18002KT 0150 R01/0450V0750 R19/P2000D FG FEW067/// SCT079/// 12/11 Q1008'
'METAR ESMK 060150Z AUTO 35003KT 320V030 0300 R01/0800V1800 R19/P2000U BCFG FEW072/// 11/11 Q1008'
'METAR ESMK 060220Z AUTO VRB02KT 1200 R01/P2000D R19/P2000D FEW001/// SCT004/// 12/12 Q1008'
'METAR ESMK 060250Z AUTO VRB02KT 0300 R01/P2000 R19/P2000D FEW091/// 12/11 Q1007'
'METAR ESMK 060320Z AUTO 19003KT 0550 R01/0400V0750 R19/P2000U NCD 12/12 Q1007'
'METAR ESDF 072050Z AUTO 21002KT 0300 R01/1000VP1500N R19/P1500N FG FEW100/// 13/12 Q1003'
'METAR ESDF 072120Z AUTO 00000KT 1000 R01/0750VP1500N R19/P1500N BKN100/// 13/12 Q1003'
'METAR ESDF 072150Z AUTO 22005KT 0350 R01/P1500N R19/P1500D FG SCT043/// BKN110/// 13/12 Q1003'
'METAR COR ESGP 070247Z 00000KT 0100 R01/0400 R19/0350V0800 BCFG SCT100 13/12 Q1001'
'METAR ESGP 070250Z 00000KT 0100 R01/0400 R19/0350V0800 FG SCT100 13/12 Q1001'
'METAR ESGP 070320Z 00000KT 0150 R01/0800V1600 R19/0400 BCFG FEW001 11/10 Q1001'
'METAR ESGP 070350Z 00000KT 0100 R01/0375 R19/0225V0350 BCFG NSC 11/09 Q1001'
'METAR ESMK 070250Z AUTO 11002KT 0600 R01/0900V1700D R19/P2000U BR FEW009/// SCT012/// 15/14 Q1001'
'METAR ESMK 070320Z AUTO VRB01KT 0600 R01/0650V1200 R19/P2000N BR FEW008/// SCT010/// 14/14 Q1001'
'METAR ESMK 072220Z AUTO 07001KT 0600 R01/0800V1500 R19/P2000D BR FEW074/// SCT110/// 14/13 Q1003'
'METAR ESMK 072250Z AUTO VRB02KT 1300 R01/P2000D R19/P2000D BCFG SCT100/// 13/13 Q1003'
'METAR ESMK 072320Z AUTO 20002KT 0500 R01/1000VP2000D R19/P2000D SCT090/// BKN100/// 13/13 Q1003'
'METAR ESMK 072350Z AUTO 19002KT 0300 R01/0900 R19/P2000D FG FEW041/// BKN089/// 13/13 Q1003'
'METAR ESMT 072050Z AUTO 06003KT 0700 R01/P2000N R19/0900V1900U FG VV000 14/13 Q1003'
'METAR ESNU 072350Z 00000KT 0800 R14/P1500N R32/P1500N MIFG NSC 03/02 Q1006'
'METAR ESTA 070150Z AUTO 15004KT 1500 BR FEW007/// SCT010/// BKN030/// 15/14 Q1001'
'METAR ESTL 070320Z AUTO 20004KT 0350 R11R/0800VP1500N R29L/0800VP1500N FG SKC 13/13 Q1002'
'METAR ESMK 080050Z AUTO VRB01KT 0600 R01/P2000D R19/P2000U FEW025/// BKN041/// 13/12 Q1002'
At this point, it's probably better to make the regex less specific, rather than more specific. The following works on that larger sample of data:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})[A-Z]?\>', 'tokens');
Thanks a lot! This is working quite well but I'm still having problems extracting all the relevant data. There seems to be a lot of exceptions in the textstrings. Here's an example:
METAR ESSA 181550Z 13007KT 1100 R01L/P1500N R19R/1400N R01R/1400N R19L/P1500N SN VV007 M01/M01 Q1013 R01L/590249 R08/590247 R01R/12//70 BECMG 9999 NSW BKN012
Your expression above seems to capture the '9999' after BECMG as well unfortunately. Also I would like to extract the first part of the RVR in the string:
'R11R/0800VP1500N'
that is:'0800' if that is possible? At the moment I get the second value '1500' which is less important for me. I guess it would be easier if you had access to the entire DATALow cell?
For the moment DATALow is of size
177096 1
The BECMG problem is easy to fix:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[^ ]*?(\d{4})[A-Z]?\>', 'tokens');
From your previous examples and questions, I assumed that the 2nd part of the RVR is what you wanted, not the first part. The regexp is explicitly designed to discard that first part if present. It certainly can be changed to return the first part instead of the 2nd.
However, at this point it would be much better if you explained thoroughly the grammar of that bit of runway string. What is allowed and not allowed for each part, what part or combination of part are optional, and of course, which part you actually want?
Hello again! Sorry for being unclear. I would like to extract the first RVR-value which should be the lowest of the two given in a Group like this one:
'R11R/0800VP1500N'
The following combination of RVR-groups is what I have detected so far:
'R30/0450'
'R10/0375V0550N'
'R01L/P1500N'
'R01R/0750V1000N'
'R19L/0900N'
'R19R/P1500N'
'R03/P1500N'
The code I'm currently using is this:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})[A-Z]*(?:(?:\d{4})[A-Z])?\>', 'tokens');
tokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %concatenate all pairs of each row vertically
The data I'm currently extracting that is erroneous is: '9999' after 'BECMG' Group as mentioned above. Also 'R01' and '0023' at the end of this string:
'METAR ESGJ 102247Z 35015KT 1200 SHSN FEW006 BKN010 M04/M04 Q1009 R01/790023'
I think I found the magic expression! This seems to work:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?[^ ]*?(\d{4})*[A-Z]\>', 'tokens');
It's a combination of both Guillaume's and Andrei's expressions. A huge thank you for your patience and knowledge!
Just one tiny thing left to deal with. How to choose the first of the two values in the Group?
'R11R/0800VP1500N'
Thank you!
I think I solved it:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})*[A-Z]*[^ ]*?\>', 'tokens');
You cannot create a regular expression (even a dynamic one) that would match the smaller of the two numerical groups if both are present. You would have to return both group and select the minimum afterward.
I believe the following would suit:
%the regexp now returns three tokens per match, the last token of each match may be empty
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4})[A-Z]?(\d{4})?[A-Z]?\>', 'tokens');
tokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %concatenate all pairs of each row vertically
alltokens = vertcat(tokens{:}); %concatenate it all regardless of row, note that this remove empty rows
allvalues = str2double(alltokens(:, [2 3])); %convert RVR tokens to number. If only one RVR per match, the second token is converted to NaN
minvalues = min(allvalues, [], 2);
If using an old version of matlab where min does not ignore nans by default, replace the nans by inf before the call to min:
allvalues(isnan(allvalues)) = inf;
or use nanmin if appropriate toolbox is installed.

Iniciar sesión para comentar.

 Respuesta aceptada

Guillaume
Guillaume el 18 de Oct. de 2016
Editada: Guillaume el 18 de Oct. de 2016
As per comment to question, changing the regex to take into account the optional letter is not a problem.
To produce your output, I believe the following would work:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})[A-Z]?\>', 'tokens'); %find RVR in DATALow
tokens = cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false); %concatenate all pairs of each row vertically
alltokens = vertcat(tokens{:}); %concatenate it all regardless of row, note that this remove empty rows
allvalues = str2double(alltokens(:, 2)); %convert RVR value from string to number. str2double is a lot safer than str2num and can work on cell arrays
destcol = repelem((1:numel(tokens))', cellfun(@(c) size(c, 1), tokens)); %find column destination for each row of alltokens and allvalues
[runway, ~, destrow] = unique(alltokens(:, 1)); %get unique runway id and row destination for each row of alltokens and allvalues
visibility = nan(numel(runway), numel(tokens)); %initialise output matrix.
%visibility = zeros(numel(runway), numel(tokens)) + 9999; %if you want 9999 instead
visibility(sub2ind(size(visibility), destrow, destcol)) = allvalues;
If I remember correctly, you're using an old version of matlab, which may not have repelem, in which case:
repelem = @(v, r) cell2mat(arrayfun(@(n, r) repmat(n, 1, r), v, r, 'UniformOutput', false)')';
for this particular case.
edit: new more versatile regex

1 comentario

Awesome Guillaume! This is just what I needed. Just one more thing, I'm using this expression instead:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/(\w(\d{4})|(\d{4})?:((\d{4})?[A-Z]+)?(\d{4})[A-Z]|(\d{4}))', 'tokens');
But I have some strings like this as mentioned above:
'METAR ESMK 060020Z AUTO 00000KT 0800 R01/P2000D R19/P2000D FEW067/// 12/12 Q1008'
I'm getting an unwanted extra 'P' Before my '2000' like this:
'R01' 'P2000'
'R19' 'P2000'
Ohterwise it does exactly what I want!

Iniciar sesión para comentar.

Más respuestas (1)

tokens = regexp(DATALow, '\<(R\d{2})/(\d{4})[A-Z]+(?:(?:\d{4})[A-Z])?\>', 'tokens');
out = cellfun(@(x)cat(1,x{:}),tokens,'un',0);

6 comentarios

Thank you, this almost works as well but I miss the data from a string that looks like this:
'METAR ESNY 010150Z AUTO 28003KT 0200 R30/0450 FG VV001 12/12 Q1010'
I put an extra '?' after [A-Z] and it solved that problem. Now I have another string that looks like this:
'METAR ESNY 010420Z AUTO 30004KT 1300 R30/1000VP1500U VV002 12/12 Q1012 REFG'
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/(\d{4})[A-Z]*(?:(?:\d{4})[A-Z])?\>', 'tokens');
out = cellfun(@(x)cat(1,x{:}),tokens,'un',0);
This is almost working:
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/.*?(\d{4})[A-Z]*(?:(?:\d{4})[A-Z])?\>', 'tokens');
It seems to extract the '9999' Group after 'BECMG' from this string:
METAR ESSA 181550Z 13007KT 1100 R01L/P1500N R19R/1400N R01R/1400N R19L/P1500N SN VV007 M01/M01 Q1013 R01L/590249 R08/590247 R01R/12//70 BECMG 9999 NSW BKN012
I rather like it to not extract that group if it is possible. I mentioned it above as well, in a comment to Guillaume. Your expression seems to get the right RVR-value from each string that is the first part in the example:
'R11R/0800VP1500N'
Which is just what I'm looking for.
Also '0023' in this str is wrongly extracted:
'METAR ESGJ 102247Z 35015KT 1200 SHSN FEW006 BKN010 M04/M04 Q1009 R01/790023'
That group contains information about the Runway condition and braking action which I'm not interested in for the moment.
Thank you!
tokens = regexp(DATALow, '\<(R\d{2}[A-Z]?)/[A-Z]?(\d{4,})[A-Z]*(?:(?:\d{4})[A-Z])?\>|(?:\<BECMG\>).*(\<\d{4}\>)', 'tokens');

Iniciar sesión para comentar.

Preguntada:

el 18 de Oct. de 2016

Comentada:

el 20 de Oct. de 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by