Replacing special character 'É' to 'E'
45 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Pete sherer
el 28 de Nov. de 2022
Hi,
Is there a Matlab function to replace the special characters (like 'É') to the regular UTF-8 or ISO-8859-1?
Thanks,
1 comentario
Stephen23
el 28 de Nov. de 2022
"regular UTF-8 or ISO-8859-1"
Both UTF-8 (encodes all Unicode characters) and ISO-8859-1 include "É"... Perhaps you meant to ask something like "how to remove diacritics from characters?", which would match your question title.
Respuesta aceptada
Jonas
el 28 de Nov. de 2022
looks like there are only manual solutions.
Stackoverflow is your friend ;-)
6 comentarios
Jonas
el 29 de Nov. de 2022
also it is qeustionable to do this whole thing since the change of letters can change th emeaning of the words, also in German for example, ä, ö and ü are changed to ae, oe and ue, but the same procedure does not make sence in other languages like turkish
Stephen23
el 12 de Dic. de 2023
Editada: Stephen23
el 14 de Dic. de 2023
@Jonas: your concern is well-founded. That function confuses two related (yet distinct) aspects of languages:
- splitting ligatures (sometimes used with lexicographical sorting)
- diacritics in graphemes
Note that ligatures are not diacritics, so splitting the ligatures Æ,Œ, etc. is not removing diacritics. The esszett character ß also does not have any diacritics nor is it considered to be a ligature (although it does derive from one). The lexicographical sorting rules of some languages do require treating those ligatures and characters as being equivalent to some other characters.... but that is distinct from removing diacritics from characters.
The function also fails to remove diacritics from other (even Latin-based) characters, e.g. Ǣ.
The function also returns the wrong character in some cases, e.g. eth ð has no diacritic. That it is commonly transliterated into latin script as d is irrelevant (and misleading: the digraph th would be better).
In short: the function is misnamed and does not really do what it claims.
Más respuestas (2)
Stephen23
el 28 de Nov. de 2022
Editada: Stephen23
el 28 de Nov. de 2022
"Is there a Matlab function to replace the special characters (like 'É')"
You can call Python from MATLAB, and it can do the heavy-lifting:
inp = 'É';
baz = @(v)char(v(1)); % only need the first decomposed character.
out = baz(py.unicodedata.normalize('NFKD',inp)) % to remove diacritics.
Read more:
0 comentarios
John D'Errico
el 28 de Nov. de 2022
Editada: John D'Errico
el 28 de Nov. de 2022
Easy peasy.
str = 'ABCDEFGHIJKÉÉÀÀÄÄabcdefghijkl'
strrep(str,'É','E')
If there are other special characters you want replaced, strrep will handle them too, but it looks like you would need to do them one at a time with strrep. But other tools would certainly work too. Certainly regexp, but I've never been very good at regular expressions. :) This will work though:
badchar = 'ÉÀÄ';
goodchar = 'EAA';
[u,v] = ismember(str,'ÉÀÄ');
str(u) = goodchar(v(u))
1 comentario
Robert Wagner
el 12 de Dic. de 2023
but I've never been very good at regular expressions. :) ---> I've never tried to be in the first place... :-)))
Ver también
Categorías
Más información sobre Characters and Strings en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!