Vectorizing unicode2native

Hi
I am trying to remove the non-compatible letter with windows-1252 encoding. I can do this using a for-loop, but trying to speed up this algorithm if possible.
events_Description =["1:BRAZIL-ANAPOLIS";"2:BRAZIL-Barra de São Francisco";"3:BRAZIL-CAJAMAR";"4:BRAZIL-CAMAÇARI";"5:BRAZIL-CANSANÇÃO";"6:BRAZIL-CONCEIÇÃO DE FEIRA";"7:BRAZIL-CONCEIÇÃO DO JACUÃPE"];
tdata = repmat("", height( events_Description), 1);
for runi= 1: length( events_Description)
tnew = unicode2native( events_Description( runi), 'windows-1252');
tnew( ismember( tnew, [129 141 143 144 157])) = [];
tdata( runi)= native2unicode( tnew, 'windows-1252');
end

Respuestas (1)

Walter Roberson
Walter Roberson el 1 de Dic. de 2022

0 votos

%precompute
L = false(1,MaximumPermittedCharacter) ;
L(native2unicode(1:255,'windows-1252')) = true;
%if you want to disallow specific characters set them to false
%then the filtering becomes
Message = Message(L(Message));

3 comentarios

Pete sherer
Pete sherer el 1 de Dic. de 2022
sorry, i didn't quite follow what you are doing here.
What is the MaximumPermittedCharacter?
why using 1:255?
Message is basically the 'events_Description'?
You are creating a lookup table to determine whether each character is to be permitted or not. Then you just take the input and use it to index the table. But for that to be successful, the input characters from events_description cannot have any entries with character position larger than the size of your table.
If you know ahead of time that no input character will ever be beyond (say) U+23FF (which gets you smart quotes, infinity symbol, power-off-button character), then you can
MaximumPermittedCharacter = 0x23ff;
L = false(1,MaximumPermittedCharacter) ;
L(native2unicode(1:255,'windows-1252')) = true;
to create the table.
The alternative is,
L = logical.empty;
L(native2unicode(1:255,'windows-1252')) = true;
MaximumPermittedCharacter = size(L,2);
then the lookup becomes
Message(Message > MaximumPermittedCharacter) = [];
Message = Message(L(Message));
which requries two passes over the message instead of one pass
Walter Roberson
Walter Roberson el 1 de Dic. de 2022
Message is basically the 'events_Description'?
Message would be events_Description{runi} -- a character vector

Iniciar sesión para comentar.

Categorías

Más información sobre Data Type Conversion en Centro de ayuda y File Exchange.

Productos

Versión

R2022a

Etiquetas

Preguntada:

el 1 de Dic. de 2022

Comentada:

el 1 de Dic. de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by