Is there a way to detect string/letter not compatible with windows-1252 encoding?

3 visualizaciones (últimos 30 días)
Hi,
Is there a way to detect string/letter not compatible with windows-1252 encoding And then remove them?

Respuestas (1)

Walter Roberson
Walter Roberson el 29 de Nov. de 2022
Yes. When you unicode2native a character that has no counterpart in the destination character set, then binary 26 is substituted.
S = char(['How now?', 2000:2029, ' brown cow'])
S = 'How now?ߐߑߒߓߔߕߖߗߘߙߚߛߜߝߞߟߠߡߢߣߤߥߦߧߨߩߪ߫߬߭ brown cow'
B = unicode2native(S, 'windows-1252')
B = 1×48
72 111 119 32 110 111 119 63 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
B(B == 26) = []
B = 1×18
72 111 119 32 110 111 119 63 32 98 114 111 119 110 32 99 111 119
Sback = native2unicode(B, 'windows-1252')
Sback = 'How now? brown cow'
  1 comentario
Pete sherer
Pete sherer el 30 de Nov. de 2022
Editada: Pete sherer el 30 de Nov. de 2022
This resolves the problem.
My string is either cellstr or string type. Is there a way to bypass using char via a for loop?
City= {'Serra';'Anápolis';'CONCEIÇÃO DE FEIRA';'CONCEIÇÃO DO JACUÃPE'; 'Test ߙߚߛߜߝߞߟߠߡߢߣߤߥ'};
for runi = 1: length(City)
ori = unicode2native( char( City{ runi}), 'windows-1252');
if any(ori==26), disp(['runi=' num2str(runi)]); ori(ori==26)=[]; end;
tnew(runi) = cellstr(native2unicode( ori, 'windows-1252'));
end
Also if I want to check compatibility with UTF-8, do I still check it against 26?

Iniciar sesión para comentar.

Categorías

Más información sobre Characters and Strings en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by