How do I find a folder with a specified string?

50 visualizaciones (últimos 30 días)
Daniel Bridges
Daniel Bridges el 30 de En. de 2018
Editada: Stephen23 el 31 de En. de 2018
I think I need your help using regexp: My goal is to find the RTPLAN DICOM file and read particular metadata from it. Trying to get the full folder name to use in fullfile to use in dicominfo, I tried the following which failed with an error I don't understand:
>> result = regexp(listing.name,'RTPLAN','match')
Error using regexp
Invalid option for regexp:
doe^john_anon53250_ct_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000.
The folder containing the string 'RTPLAN' clearly exists as the penultimate entry in the following directory listing: Exporting anonymized patient data from MIM Maestro we get
>> DICOMdatafolder = '/home/sony/Documents/research/data/DICOMfiles/5';
listing = dir(DICOMdatafolder);
listing.name
ans =
'.'
ans =
'..'
ans =
'DOE^JOHN_ANON53250_CT_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n132__00000'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00001'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00002'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00003'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00004'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00005'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00006'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00007'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00008'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00009'
ans =
'DOE^JOHN_ANON53250_RTDOSE_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__0000A'
ans =
'DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
ans =
'DOE^JOHN_ANON53250_RTst_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000'
So the task I'm trying to accomplish is: Given a list of folder names like this, grab the one that contains 'RTPLAN' so it can be used in fullfile. What was wrong with my use of regexp?

Respuesta aceptada

Stephen23
Stephen23 el 30 de En. de 2018
Editada: Stephen23 el 30 de En. de 2018
The problem is that listing.name expands to a comma-separated list, so your code
regexp(listing.name,'RTPLAN','match')
is exactly equivalent to
regexp(listing(1).name, listing(2).name, listing(3).name, listing(4).name, listing(5).name, listing(6).name, ... , 'RTPLAN','match')
where each element of the structure listing supplies one name field as an input argument to regexp: this clearly produces far too many inputs for regexp, and those inputs are supplied in meaningless positions as well, thus the error.
Comma-separated lists were introduced in my answer to your earlier question:
The solution is to put all of those elements of that list into one cell array, e.g.:
result = regexp({listing.name},'RTPLAN','match')
where
{listing.name}
is of course equivalent to
{listing(1).name, listing(2).name, listing(3).name, ...}
This is explained in the MATLAB documentation that I linked to in my earlier answer. I would recommend reviewing what comma-separated lists are, because judging by your other question they are causing you some confusion (in particular comma-separated lists are not one variable). You might like to start here:
  4 comentarios
Daniel Bridges
Daniel Bridges el 31 de En. de 2018
Editada: Daniel Bridges el 31 de En. de 2018
I am still struggling to elegantly obtain the entire string. find cannot be used with cell arrays, and seemingly must be used with matrices; cell2mat collapses the cell array resulting from regexp losing the information about which directory contains the matching string. Looking again at the regexp documentation, I have not yet found how to pull the entire string containing 'RTPLAN'. I think I should use isempty to get a result to feed into dir's output but must learn the syntax for dealing with cells.
I was trying to avoid a for loop (I seem to always use them and I am concerned it isn't making use of MATLAB's indexing), but this works:
listing = dir(DICOMdatafolder);
result = regexp({listing.name},'RTPLAN');
for loop = 1:numel(listing)
if ~isempty(result{loop})
correctfolder = loop;
end
end
listing(correctfolder).name
Stephen23
Stephen23 el 31 de En. de 2018
Editada: Stephen23 el 31 de En. de 2018
Because in this case the input to regexp is a cell array of strings the output is a cell array of the same size: one of the cells would be non-empty (containing either the matching string, the substring, or its index, depending on what output you select, and assuming one matched filename). You would then have to do some post-processing to get the contents of that one cell, such as checking which cell is empty to generate a logical index:
>> C = {listing.name};
>> idx = ~cellfun('isempty',regexp(C,'RTPLAN','once'));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
However for matching such a simple substring regexp is overkill: here are two ways to match that filename, based on faster strfind:
From cell array:
>> C = {listing.name};
>> idx = ~cellfun('isempty',strfind(C,'RTPLAN' ));
>> C{idx}
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
From structure:
>> idx = ~cellfun('isempty',strfind({listing.name},'RTPLAN' ));
>> listing(idx).name
ans = DOE^JOHN_ANON53250_RTPLAN_2013-02-04_100401_for.use.in.interfractional.blurring.study_planned.treatment_n1__00000
To which you should also add some error checking (otherwise the last step could produce multiple variables in a comma-separated list), so whichever one you choose put this immediately after idx is defined:
assert(nnz(idx)==1,'less than or more than one file found')

Iniciar sesión para comentar.

Más respuestas (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by