How to find duplicate file names(same name but different folder address), and write those files in excel with their respective folder address?

Question

Riya el 1 de Abr. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/789939-how-to-find-duplicate-file-names-same-name-but-different-folder-address-and-write-those-files-in-e

Comentada: J. Alex Lee el 2 de Abr. de 2021

Actually I have 78 files with their names and folder address in a structure. I want to check which files are occurring 2 times.. that means have same name but different path. And after that I have to write those multiple occurring files with their respected folder address in excel. How can I do that? Please suggest the code

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Jan el 1 de Abr. de 2021

Please post an explicit example of how you input data looks like. "names and folder address in a structure" is not clear enough to post some code.

Riya el 1 de Abr. de 2021

In our structure we have name of files in one column.and its path address in another column. Yes it was created using dir function.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Image Analyst el 1 de Abr. de 2021

1
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/789939-how-to-find-duplicate-file-names-same-name-but-different-folder-address-and-write-those-files-in-e#answer_664479

Abrir en MATLAB Online

Riya, try this:

% Find duplicated file names in subfolders.
fileList = dir('**\*.*')
t = struct2table(fileList);
totalNumberOfFiles = numel(fileList);
fileCount = zeros(1, totalNumberOfFiles);
f = sort(t.name);
uniqueNames = unique(t.name)
duplicatedNames = cell(0);
duplicateCount = 1;
for k = 1 : totalNumberOfFiles
	baseFileName = fileList(k).name;
	logicalIndex = ismember(t.name, baseFileName);
	fileCount(k) = sum(logicalIndex);
	if fileCount(k) > 1 && fileList(k).isdir == 0
		duplicatedNames{duplicateCount, 1} = baseFileName;
		duplicatedNames{duplicateCount, 2} = fileList(k).folder;
		duplicateCount = duplicateCount + 1;
	end
end
duplicatedNames = sort(duplicatedNames)
fprintf('Done running %s.m\n', mfilename);

2 comentarios
Mostrar NingunoOcultar Ninguno

Riya el 1 de Abr. de 2021

Thank you for replying. The above code is working correctly. Is it possible to write those file names in a cell where in first column we can have the file name and in the next following columns we can have the paths in which this particular file is present.(seperate column for seperate paths) ?

Please help. Thank you in advance.

Image Analyst el 1 de Abr. de 2021

Abrir en MATLAB Online

Try this:

% Find duplicated file names in subfolders.
fileList = dir('**\*.m*')
t = struct2table(fileList);
totalNumberOfFiles = numel(fileList);
fileCount = zeros(1, totalNumberOfFiles);
f = sort(t.name);
uniqueNames = unique(t.name)
duplicatedNames = cell(0);
duplicateCount = 1;
for k = 1 : totalNumberOfFiles
	baseFileName = fileList(k).name;
	[logicalIndex, ib] = ismember(t.name, baseFileName);
	fileCount(k) = sum(logicalIndex);
	if fileCount(k) > 1 && fileList(k).isdir == 0
		duplicatedNames{duplicateCount, 1} = baseFileName;
		linearIndex = find(ib); % Get actual index.
		for k2 = 1 : length(linearIndex)
			row = linearIndex(k2);
			duplicatedNames{duplicateCount, k2+1} = fileList(row).folder;
		end
		duplicateCount = duplicateCount + 1;
	end
end
duplicatedNames = sort(duplicatedNames);
% Remove duplicates in the duplicatedNames list
numDuplicates = size(duplicatedNames, 1);
rowsToDelete = false(numDuplicates, 1);
for k = 2 : numDuplicates
	if strcmpi(duplicatedNames{k-1}, duplicatedNames{k})
		% It's appeared again.  Mark for deletion.
		rowsToDelete(k) = true;
	end
end
% Extract only non-duplicated files.
duplicatedNames = duplicatedNames(~rowsToDelete, :)
fprintf('Done running %s.m\n', mfilename);fprintf('Done running %s.m\n', mfilename);

Iniciar sesión para comentar.

Answer 2

J. Alex Lee el 1 de Abr. de 2021

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/789939-how-to-find-duplicate-file-names-same-name-but-different-folder-address-and-write-those-files-in-e#answer_664579

Editada: J. Alex Lee el 2 de Abr. de 2021

Abrir en MATLAB Online

Here's a way that doesn't require as many loops and calls to ismember

% % uncomment to simulate the results of dir and test
% nfiles = 12;
% names = "file_"+randi(9,nfiles,1)
% folders = "folder_"+(1:nfiles)
% 
% for i = nfiles:-1:1
% 	fl(i).folder = folders(i);
% 	fl(i).name = names(i);
% end
% comment this if you want to simulate and test
fl = dir('**\*.*')
% turn structure array into table to leverage some nice features
t = struct2table(fl);
% sort the file list by the name, so duplicates will be "grouped" by row
ts = sortrows(t,"name")
% find unique file names and indices of each file name
[unames,~,uIdx] = unique(ts.name)
% uidx now contains indices of the full list into the unique list
% so we can identify duplicates as consecutively equal uidx
% determine the size of these groups (of duplicates)
grpSizes = diff([0; find(diff(uIdx)~=0); nfiles])
% split the row indices into groups (of duplicates)
rowIdxGroups = mat2cell((1:nfiles)', grpSizes)
% identify groups that contain duplicates
dupIdxGrouped = rowIdxGroups(grpSizes>1)
dupIdx = vertcat(dupIdxGrouped{:})
% a table containing only entries where filename exists in replicate
ts(dupIdx,:)
% and then dump to excel file
writetable(ts,"duplicates_only.xlsx")

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

J. Alex Lee el 2 de Abr. de 2021

@Riya, does this work?

It has the advantage that it operates directly with the indices of the table so you do not have to manually track other fields of the structure array that you are interested in.

You can also use the 3rd output of "unique" to directly assign each name to a "duplicate" group (you could also use "findgroups"). This way, you only need to call unique once rather than seek duplicates on each unique element. If you sort those results by index (or alternatively pre-sort your file list by name), you can quickly identify duplicates wherever the consecutive value is the same (by using diff and detecting zeros).

The most confusing part of my proposed code is how I split the list into duplicate groups; above uses mat2cell, but that's just hiding a loop which you could just as well use to make it more human-readable (and maybe even faster).

Once you create some representation of duplicateness (rowIdxGroups, dupIdxGrouped, dupIdx), you can directly index into the file list table to do any manipulations you want.

This example also completes your question by directly writing the duplicate-only list to an excel in a easy one-liner.

Iniciar sesión para comentar.

How to find duplicate file names(same name but different folder address), and write those files in excel with their respective folder address?

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How to find duplicate file names(same name but different folder address), and write those files in excel with their respective folder address?

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

2 comentarios
Mostrar NingunoOcultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos