Best solution to finding repeating characters on a line.
Mostrar comentarios más antiguos
I am looking for any instances of two characters (e/d) being repeated in a row greater then or equal to 10. I just want to either print every line that this occurs to the command line or stop and print the location of the stop everytime it is detected. Basically I am trying to find when e and d show up over ten times grouped together in a large data file. For example:
asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs
asseefadfefeeedddeeedddasdfsdf
asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs
asseefadfefeeedddeeedddasdfsdf
The script would then print out line 2 and line 4 in the command line.
Thank you for your help
1 comentario
Rena Berman
el 26 de Sept. de 2023
(Answers Dev) Restored edit
Respuesta aceptada
Más respuestas (1)
You say "10 or over", so is it correct that the program needs to all possible patterns? For example,
'adadadadaaaadadadadaaa'
(length 22) should be located if it exists?
S = {'asseefadfefaaadddaaadddasdfsdf', 'asseeadadadadaaaadadadadaaadfsdf'}
matches = regexp(S, '([ad]{5,})\1', 'match');
celldisp(matches)
5 comentarios
Matthew Worker
el 13 de Jul. de 2021
S = {'asseefadfefaaadddaaadddasdfsdf', 'asseeadadadadaaaadadadadaaadfsdf', 'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs', 'asseefadfefaaadddaaadddasdfsdf', 'asdfsdfsdfsasdfsdfsdfsasdfsdfsdfs', 'asseefadfefaaadddaaadddasdfsdf'}
matchidx = regexp(S, '([ad]{5,})\1', 'once')
S(~cellfun(@isempty, matchidx))
Walter Roberson
el 13 de Jul. de 2021
... Wait, any two characters, or two specific characters?
Matthew Worker
el 13 de Jul. de 2021
Example of reading from file:
%create a file for demonstration purposes only
tname = [tempname() '.txt'];
fid = fopen(tname, 'w');
T = regexprep('asseefadfefaaadddaaadddasdfsdf\nasseeadadadadaaaadadadadaaadfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\nasdfsdfsdfsasdfsdfsdfsasdfsdfsdfs\nasseefadfefaaadddaaadddasdfsdf\n', 'a', 'e');
fprintf(fid, T);
fclose(fid);
%okay, main function
filename = tname;
%okay, main function
S = readlines(filename);
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
%alternative without readlines
S = regexp(fileread(filename), '\r?\n', 'split');
matches = S(~cellfun(@isempty, regexp(S, '[de]{10}', 'once')));
matches
%alternative without splitting
S = fileread(filename);
matches = regexp(S, '^.*[de]{10}.*$', 'match', 'dotexceptnewline', 'lineanchors');
matches
Categorías
Más información sobre Programming en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!