No changes when using function erasePunctuation to remove digits.
4 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Ismat Mohd Sulaiman
el 16 de Mzo. de 2021
Comentada: Ismat Mohd Sulaiman
el 5 de Jul. de 2021
I'm trying to remove the digits in my document that has been tokenized.
However, using the erasePunctuation function, I didn't see any changes (no digits were removed) to the updated document. I've checked the type, and the tokenizer does recognize these tokens as digits. Please help. Thanks.

The output:

0 comentarios
Respuestas (1)
Cris LaPierre
el 16 de Mzo. de 2021
Editada: Cris LaPierre
el 16 de Mzo. de 2021
erasePunctuation still only erases punctuation, not numbers. The 'digits' specification tells it what type of token to remove punctuation from. See the description here.
You could try to remove digits using the following.
tkD = tokenDetails(cleanDoc);
cleanDoc = removeWords(cleanDoc,tkD{tkD.Type=="digits"});
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!