Finding a string in a file

Hello,
I need help with this problem:
How can I find the files with the same name in different folders that have the number "5800" in them? Furthermore, how can I list these files/folders and which line(s) the number "5800" appear in the files?
I would really appreciate the help.
Thanks.

2 comentarios

Fangjun Jiang
Fangjun Jiang el 22 de Jul. de 2011
"5800" is in the folder name or in the file name? Is the file a ASCII text file?
Walter Roberson
Walter Roberson el 23 de Jul. de 2011
"5800" is a target string to be found amongst several files.

Iniciar sesión para comentar.

Respuestas (8)

Walter Roberson
Walter Roberson el 23 de Jul. de 2011

3 votos

CommonFileName = 'x3175.txt'; %or whatever the common name is
folderinfo = dir('*');
folderinfo(~[folderinfo.isdir]) = [];
folderinfo(strcmp({folderinfo.name},{'.','..'})) = [];
for FoIdx = 1 : length(folderinfo)
specificname = fullfile(folderinfo(FoIdx}.name, CommonFileName);
if exists(specificname, 'file)
%at this point, insert your code to examine specificfile
end
end
Jason Ross
Jason Ross el 21 de Sept. de 2012

2 votos

I realize this question is old, but in Windows Explorer (and not strictly a MATLAB question, either), you can put "content: " in the search box in the upper right hand corner, and it will search the files (including .doc files) for the string. So for this question, putting "content: 5800" would have searched and returned a list of the files that had "5800" in it.
Also, note that you can try finding something in Explorer, and then at the bottom it says "Search again in:", and one of the choices is "content". Not the most usable thing (the search dog in previous Windows versions exposed this more quickly), but it is there.
There is also a "find" command you can run from the Command Shell, just cd to the directory in question and type
find "5800" *
And the current directory will be searched. I checked a directory where I had some Word documents and it successfully searched them.
Honglei Chen
Honglei Chen el 22 de Jul. de 2011

1 voto

Hi osminbas,
You could write a script to achieve this. You can use cd to change directories, what to list all the files in the folder, then for each file, use textscan to read each line and strfind to find '5800'. You then just write the result to either the screen or a file.
HTH
Honglei
Walter Roberson
Walter Roberson el 23 de Jul. de 2011

1 voto

On unix systems:
!grep -Hn 5800 */TheFileName.txt
Fangjun Jiang
Fangjun Jiang el 23 de Jul. de 2011

0 votos

How about this? Open M-Editor, select menu Edit->Find file. Specify what text to find, in what files and what folder/sub-folder to find, the result will show all the files, folders and the lines that the text appears in.
Many other IDEs probably have the same capability too.
osminbas
osminbas el 25 de Jul. de 2011

0 votos

Thank you all. I realized that I misworded my question. I am trying to find the string "5800" in the file itself (it is a .doc file), not in the name of the file. Again, I appreciate your help.

6 comentarios

Fangjun Jiang
Fangjun Jiang el 25 de Jul. de 2011
Does it have anything to do with MATLAB?
Walter Roberson
Walter Roberson el 25 de Jul. de 2011
All of the solutions presented are for finding "5800" in the _content_ of the file.
On the other hand, if it is a .doc file then it might not be encoded in ASCII or ISO-8891-1 or UTF-8: Microsoft has a fondness for storing strings in UTF-16 BE. For characters that fit within the US-ASCII or ISO-8891-1 character sets, the difference is that each of those characters is represented in UTF16-BE as a pair of bytes, with the first byte of the pair being binary 0 and the second byte of the pair being the normal US-ASCII or ISO-8891-1 byte encoding. You could end up needing to search for the binary uint8([0 '5' 0 '8' 0 '0' 0 '0']) which is usually harder for routines to find. Sometimes the easiest method in such cases is to open the file as UTF-16BE. read uint8 values, and then to use unicode2native() to convert the bytes to MATLAB char() data and search that, as it is not uncommon for routines to think that the first uint8(0) in a string marks the end of the string (a holdover from C's string structure.)
There can be additional complications in any structured file such as .DOC files: what we humans look at and see as '5800' in the file might happen to have (for example) '5' <change font face> '8' <end font weight> '0' <end font face> <change font color> '0' <end font color> . Integrating with the OpenOffice freeware may perhaps be the easiest solution to strip the markup out and allow search on plain characters.
Walter Roberson
Walter Roberson el 25 de Jul. de 2011
Or just go through and save each .doc file as plain text without markup and then search that plain text.
osminbas
osminbas el 28 de Jul. de 2011
Thank you all. Especially, thank you, Walter. In your code, when I define the folderinfo, it is the "big" folder that has all the subfolders in it. Some of these subfolders have the file I am looking for and some of them don't. How can I get only the ones with that file in them. And furthermore, how can I list these folder names in an excel sheet or text file (whichever is easier)?
Also, how do I search for the string "5800" inside the document? I know that you talked about the difficulties of reading inside .doc files but maybe you can answer my question assuming it is a txt file.
I really appreciate your help.
Walter Roberson
Walter Roberson el 28 de Jul. de 2011
The line "if exists(specificname, 'file)" that I put in the code checks to see whether the file exists in the subdirectory you are processing, and skips the string search if the file is not there.
For searching within a specific file, there are many ways. One way is:
fid = fopen(specificfile, 'rt');
lines = textscan(fid,'%[^\n]'); %reads line by line
fclose(fid);
L = find(~cellfun(@isempty,strfind(lines, '5800')),1,'first');
if ~isempty(L)
fprintf('found in %s at line %d\n', specificfile, L);
end
K E
K E el 21 de Sept. de 2012
I just used this code in another project, so thanks Walter

Iniciar sesión para comentar.

osminbas
osminbas el 28 de Jul. de 2011

0 votos

Thank you all. Especially, thank you, Walter. In your code, when I define the folderinfo, it is the "big" folder that has all the subfolders in it. Some of these subfolders have the file I am looking for and some of them don't. How can I get only the ones with that file in them. And furthermore, how can I list these folder names in an excel sheet or text file (whichever is easier)?
Also, how do I search for the string "5800" inside the document? I know that you talked about the difficulties of reading inside .doc files but maybe you can answer my question assuming it is a txt file.
I really appreciate your help.
venkat vasu
venkat vasu el 22 de Sept. de 2012
Editada: Walter Roberson el 22 de Sept. de 2012

0 votos

a1=dir;
l=length(a1);
for i1=3:l
files=dir(a1(i1).name);
nfiles = length(files);
for i=3:nfiles
currentfilename = files(i).name;
if currentfilename==5800
%whatever operation
end
end
end
this code surely will help you...

1 comentario

Walter Roberson
Walter Roberson el 22 de Sept. de 2012
The task was to search the content, not the file name.
Also, "currentfilename" from files(i).name will be a string, but you attempt to compare the string to the numeric value 5800 . That is not going to have the result you expect.

Iniciar sesión para comentar.

Categorías

Más información sobre Search Path en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 22 de Jul. de 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by