tokenizedDocument withe special caharcters

18 visualizaciones (últimos 30 días)
Said Elaiwat
Said Elaiwat el 6 de Dic. de 2022
Editada: Ravi Chaitanya el 29 de Dic. de 2022
How can use "TokenizedDocument" method to Tokenize text with special characters and case sensetive. I need to keep them as they are and use only space as a stoppings character. for example:
"c1ccc2Nc3ccccc3C Nc2c1C"
there are tow tokenz : "c1ccc2Nc3ccccc3C" and "Nc2c1C"
Thanks

Respuestas (1)

Ravi Chaitanya
Ravi Chaitanya el 29 de Dic. de 2022
Editada: Ravi Chaitanya el 29 de Dic. de 2022
Hello Said,
As per my understanding, you want to tokenize text with special characters and case sensitive letters and want to retain the grouping and case sensitivity of text in the tokenized output.
One of the ways to achieve that is by specifying custom tokens as an argument to "tokenizedDocument" function. The code snippet below shows an example:
str="I am a BeGiNnEr in C# , C++"
documents = tokenizedDocument(str,'CustomTokens',["C++" "C#"])
In the output, it can be observed that the case sensitivity of the characters is retained(this is by default) and the custom tokens specified C# and C++ are not tokenized.
Please refer to the MathWorks documentation on tokenizedDocument to know more about other options available for special characters handling.

Categorías

Más información sobre MATLAB Compiler en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by