Split string into graphemes
Split Text into Graphemes
Split text into graphemes using the
A grapheme (also known as grapheme clusters) is the Unicode term for human-perceived characters. Some graphemes contain multiple code units. For example, the "smiling face with sunglasses" emoji (😎 with code point U+1F60E) is a single grapheme but comprises two UTF16 code units
Split the text
"Smile! 😎" into graphemes.
str = "Smile! " + compose("\xD83D\xDE0E")
str = "Smile! 😎"
newStr = splitGraphemes(str)
newStr = 8x1 string "S" "m" "i" "l" "e" "!" " " "😎"
Here, the function does not split the emoji into multiple characters.
str — Input text
string array | character vector | cell array of character vectors
Input text, specified as a string array, character vector, or cell array of
character vectors. For string array and cell array input, each element of
str must have the same number of graphemes.
If the number of graphemes is not the same for every element of
str, then call the function in a for-loop to split the elements
str one at a time.
newStr — Split graphemes
string array | cell array of character vectors
Split graphemes, returned as a string array or a cell array of character vectors. If
str is a string array, then
newStr is also a
string array. Otherwise,
newStr is a cell array of character
The size of
newStr depends on the input:
stris a string scalar or a character vector, then
numGraphemes-by-1 string array or cell array, where
numGraphemesis the number of graphemes.
M-by-1 string array or cell array, then
stris a 1-by-
Nstring array or cell array, then
newStris a 1-by-
For a string array or cell array of any size, the function orients the split graphemes along the first trailing dimension with size 1.
Introduced in R2019a