Index for subfolders without *.pdf files

Hi guys. I have one folder which contains several 100 subfolders. I need to index for subfolder with does not contain a pdf file, and locate these folders. Hwo can I do this? Thanks a lot

 Respuesta aceptada

Stephen23
Stephen23 el 11 de Oct. de 2019
Editada: Stephen23 el 11 de Oct. de 2019
Simpler and more robust:
D = 'path to the main folder';
S = dir(fullfile(D,'*'));
N = setdiff({S([S.isdir]).name},{'.','..'});
F = @(s)isempty(dir(fullfile(D,s,'*.pdf')));
X = cellfun(F,N)
It returns logical indices, simply use FIND for subscript indices.
The names of the folders without .PDF files:
N(X)

5 comentarios

Johnny Birch
Johnny Birch el 11 de Oct. de 2019
Hi Stephen Cobeldick, thanks a lot. Is it possible to look for *pdf in the third sub-level only? For instance in Mainfolder -> subfolder -> subsubfolder -> subsubsubfolder even if there is several subfolders and subsubfolder? I hope this makes sence?
Stephen23
Stephen23 el 11 de Oct. de 2019
Editada: Stephen23 el 11 de Oct. de 2019
" Is it possible to look for *pdf in the third sub-level only?"
Of course. Most likely it would be easier to implement using nested loops.
How many subfolders are there? (constant or variable?)
How many subsubfolders per subfolder? (constant or variable?)
How many subsubsubfolders per subsubfolder? (constant or variable?)
What do you expect the output to look like (please given an exact example).
Johnny Birch
Johnny Birch el 11 de Oct. de 2019
Hi Stephen Cobeldick, I really appreciate you help.
The data structure is as shown below (and also attached as a .zip file).
There will always be 3 Subfolders. The number of SubSubFolders and SubSubSubFolders is variable. So what I like to do is:
1) Always start from the MainFolder
2) Search for pdf files which always starts with 'Target' in all SubSubSubFolders. It should NOT search for pdf files in the SubSubSubSubFolders
3) The final output should be a list of those SubSubSubFolders that do NOT contain a 'Target' pdf file. In this case:
  • MainFolder\Subfolder1\SubSubFolder1_1_1\SubSubSubFolder1_1_3
  • MainFolder\Subfolder3\SubSubFolder3_1_1\SubSubSubFolder3_1_1
  • MainFolder\Subfolder3\SubSubFolder3_1_2\SubSubSubFolder3_1_2
DataStructure.png
This should get you started, please adjust it to fit your exact structure and needs:
D = './MainFolder'; % path to the main folder.
out = {};
ds1 = dir(fullfile(D,'*'));
dn1 = setdiff({ds1([ds1.isdir]).name},{'.','..'});
for k1 = 1:numel(dn1) % loop over subfolders.
ds2 = dir(fullfile(D,dn1{k1},'*'));
dn2 = setdiff({ds2([ds2.isdir]).name},{'.','..'});
for k2 = 1:numel(dn2) % loop over subsubfolders.
ds3 = dir(fullfile(D,dn1{k1},dn2{k2},'*'));
dn3 = setdiff({ds3([ds3.isdir]).name},{'.','..'});
for k3 = 1:numel(dn3) % loop over subsubsubfolders.
tmp = fullfile(D,dn1{k1},dn2{k2},dn3{k3});
fnm = dir(fullfile(tmp,'Target*.pdf'));
if isempty(fnm)
out{end+1} = tmp;
end
end
end
end
Johnny Birch
Johnny Birch el 13 de Oct. de 2019
Hi Stephen Cobeldick, your code works perfectly, Thanks a lot, you saved my a lot of time.

Iniciar sesión para comentar.

Más respuestas (1)

Hemant Verma
Hemant Verma el 11 de Oct. de 2019
This should work.
% specify path to the source folder (in your case the one which contains 100 subfolders)
rootFolderPath = './RootFolder';
% get all subfolders and files (if any) inside root folder
allFolders = dir(rootFolderPath);
% initialise an empty variable to store indices of folders without PDF file
foldersWithoutPDF = [];
% for each element of allFolders
for i = 3:length(allFolders)
% check whether it is a folder and it does not contain any pdf file
if ( isdir([rootFolderPath filesep allFolders(i).name]) && ...
isempty(dir([rootFolderPath filesep allFolders(i).name filesep '*.pdf'])) )
foldersWithoutPDF = [foldersWithoutPDF ; i-2];
end
end
The variable "foldersWithoutPDF" should contain the indices of all subfolders without PDF file.

1 comentario

Stephen23
Stephen23 el 11 de Oct. de 2019
Editada: Stephen23 el 11 de Oct. de 2019
Note that this line is fragile/buggy:
for i = 3:length(allFolders)
because its author incorrectly assumed that the first two elements of allFolders are always the folder shortcuts '.' and '..'. In fact:
  1. there is no guarantee that any particular OS will return those shortcuts.
  2. there is no guarantee that they will be returned as the first two names. In fact it is trivial to create some file/folder names which demonstrate that they are not always the first two returned names:
>> fclose(fopen('+test.txt','wt'));
>> fclose(fopen('-test.txt','wt'));
>> fclose(fopen('@test.txt','wt'));
>> S = dir('*');
>> S.name
ans = +test.txt
ans = .test.txt
ans = .
ans = ..
ans = @test.txt
Also note that fullfile is recommended for creating file paths, rather than string concatenation.

Iniciar sesión para comentar.

Categorías

Más información sobre Environment and Settings en Centro de ayuda y File Exchange.

Productos

Versión

R2019a

Etiquetas

Preguntada:

el 11 de Oct. de 2019

Comentada:

el 13 de Oct. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by