Replacing only certain instances of text within matlab character array
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
rob
el 20 de En. de 2017
Comentada: John Leal
el 16 de Oct. de 2017
I have a large character array in matlab: 'lineDataA' - containing many different numbers.
I would like to find and replace all instances of the number '6002' and replace with '0', apart from the very first instance.
lineData = replace(lineDataA, '6002', '0');
This replaces all instances
And
where6002 = strfind(lineDataA, '6002');
Gives the position of all the instances. However I am not sure how to replaces all the instances except the first?
Many thanks for your help,
Rob
0 comentarios
Respuesta aceptada
Stephen23
el 20 de En. de 2017
Editada: Stephen23
el 20 de En. de 2017
Method One: split the string
>> str = '___6002__6002___6002___6002__';
>> idx = regexp(str,'6002','once','end');
>> strcat(str(1:idx),strrep(str(idx+1:end),'6002','0'))
ans =
___6002__0___0___0__
Method Two: use a placeholder
>> str = '___6002__6002___6002___6002__';
>> str = regexprep(str,'6002','\b','once');
>> str = strrep(str,'6002','0');
>> regexprep(str,'\b','6002')
ans =
___6002__0___0___0__
Note that the original string must not contain \b.
Method Three: dynamic regular expression
>> str = '___6002__6002___6002___6002__';
>> regexprep(str,'(.*?6002)(.*)','$1${strrep($2,''6002'',''0'')}')
ans =
___6002__0___0___0__
2 comentarios
John Leal
el 16 de Oct. de 2017
I have a similar problem. I need to replace some words for others in an extense array. I have the code but is too slow. Can you help me to find a way to make it better?:
if true
% code
textData = regexprep(textData, '[@$/#.-:-&*+=[]?!(){},''">_<;%]|', ' ');
% Remove any non alphanumeric characters
textData = regexprep(textData, '[^a-zA-Zñ ]', '');
textData = regexprep(textData, '[0-9]+', ' ');
textData = regexprep(textData, '<[^<>]+>', ' ');
textData = regexprep(textData, 'á', 'a');
textData = regexprep(textData, 'é', 'e');
textData = regexprep(textData, 'í', 'i');
textData = regexprep(textData, 'ó', 'o');
textData = regexprep(textData, 'ú', 'u');
textData = regexprep(textData, 'ñ', 'n');
textData = regexprep(textData, 'x', 's');
textData = regexprep(textData, 'cc', 'c');
textData = regexprep(textData, 'ci', 'si');
% deletedWords = ["helllo","hello";"moter","mother"] ... 50000 rows
% excludedWords = ["father","three", "tree"]... words I don't want to replace
% textData = ["my mother lives with my father";"hello Word"]... 2 million rows.
m = length(deletedWords(:,1));
for idx=1:m
w_new = deletedWords{idx,1};
w_ok = deletedWords{idx,2};
f = find(excludedWords==w_new, 1);
% only if it is not in excludesWords
if isempty(f)
% Replace EXACT word match"
textData = regexprep(textData,"(?<![\w])"+w_new+"(?![\w])" ,w_ok );
end
end
end
John Leal
el 16 de Oct. de 2017
The main idea is to correct misspelling words in SPANISH. It is like a handmade stem adjust to my specific data. deletedWords contains the misspelling word and the correct word. These words are extracted from the same textData using jaro wrinkler to convert less frequent word to a high frequent word with more than 95% similarity.
Ty
Más respuestas (0)
Ver también
Categorías
Más información sobre Environment and Settings en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!