Replacing only certain instances of text within matlab character array

>> str = '___6002__6002___6002___6002__';
>> idx = regexp(str,'6002','once','end');
>> strcat(str(1:idx),strrep(str(idx+1:end),'6002','0'))
ans =
___6002__0___0___0__

Method Two: use a placeholder

>> str = '___6002__6002___6002___6002__';
>> str = regexprep(str,'6002','\b','once');
>> str = strrep(str,'6002','0');
>> regexprep(str,'\b','6002')
ans =
___6002__0___0___0__

Note that the original string must not contain \b.

Method Three: dynamic regular expression

>> str = '___6002__6002___6002___6002__';
>> regexprep(str,'(.*?6002)(.*)','$1${strrep($2,''6002'',''0'')}')
ans =
___6002__0___0___0__

2 comentarios
Mostrar NingunoOcultar Ninguno

John Leal el 16 de Oct. de 2017

Abrir en MATLAB Online

I have a similar problem. I need to replace some words for others in an extense array. I have the code but is too slow. Can you help me to find a way to make it better?:

if true

% code
textData = regexprep(textData, '[@$/#.-:-&*+=[]?!(){},''">_<;%]|', ' ');
% Remove any non alphanumeric characters
textData = regexprep(textData, '[^a-zA-Zñ ]', '');
textData = regexprep(textData, '[0-9]+', ' ');
textData = regexprep(textData, '<[^<>]+>', ' ');
textData = regexprep(textData, 'á', 'a');
textData = regexprep(textData, 'é', 'e');
textData = regexprep(textData, 'í', 'i');
textData = regexprep(textData, 'ó', 'o');
textData = regexprep(textData, 'ú', 'u');
textData = regexprep(textData, 'ñ', 'n');
textData = regexprep(textData, 'x', 's');
textData = regexprep(textData, 'cc', 'c');
textData = regexprep(textData, 'ci', 'si');
% deletedWords = ["helllo","hello";"moter","mother"] ... 50000 rows
% excludedWords = ["father","three", "tree"]... words I don't want to replace  
% textData = ["my mother lives with my father";"hello Word"]... 2 million rows.
m = length(deletedWords(:,1));
for idx=1:m
    w_new = deletedWords{idx,1};
    w_ok = deletedWords{idx,2};
      f = find(excludedWords==w_new, 1);
      % only if it is not in excludesWords
      if isempty(f)
          % Replace EXACT word match"
          textData = regexprep(textData,"(?<![\w])"+w_new+"(?![\w])" ,w_ok );
      end
  end
end

John Leal el 16 de Oct. de 2017

The main idea is to correct misspelling words in SPANISH. It is like a handmade stem adjust to my specific data. deletedWords contains the misspelling word and the correct word. These words are extracted from the same textData using jaro wrinkler to convert less frequent word to a high frequent word with more than 95% similarity.

Ty

Iniciar sesión para comentar.

Replacing only certain instances of text within matlab character array

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Replacing only certain instances of text within matlab character array

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno