how to find most common words in text by matlab
4 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
bita hallajian
el 28 de Oct. de 2017
Comentada: Charmaine Tan
el 26 de Nov. de 2018
how to tag POS on nouns and verbs in MATLAB, Is it related to regular expressions? I know that regular expressions find a pattern in a text, but I want to find the most common words in texts and tag POS on them( I mean the words are nouns or verbs) and then exchange that POS and make an unfamiliar pair of words. how can I find the most common words in texts by MATLAB?is there any solution for that or I should use another software?
0 comentarios
Respuesta aceptada
Christopher Creutzig
el 2 de Nov. de 2017
Editada: Christopher Creutzig
el 26 de Nov. de 2018
Finding the most common words is easy with Text Analytics Toolbox:
>> sonnets = extractFileText("sonnets.txt");
>> sonnets = erasePunctuation(sonnets);
>> tokenizedSonnets = tokenizedDocument(lower(sonnets));
>> bag = bagOfWords(tokenizedSonnets);
>> topkwords(bag, 10)
ans =
10×2 table
Word Count
______ _____
"and" 490
"the" 436
"to" 409
"my" 371
"of" 370
"i" 344
"in" 321
"that" 320
"thy" 281
"thou" 234
You probably want to remove some words (check out removeWords and stopWords). POS tagging is supported in release R2018b and later, see addPartOfSpeechDetails.
2 comentarios
Christopher Creutzig
el 2 de Mayo de 2018
What command(s) did you try to read that file? The error message looks like you tried to read it as a table; try using the commands listed above instead.
Más respuestas (2)
Sarah Palfreyman
el 30 de Abr. de 2018
Editada: Sarah Palfreyman
el 30 de Abr. de 2018
2 comentarios
IORUNDU GABRIEL
el 16 de Mayo de 2018
Which version of matlab is the least that supports the Text analytic toolbox?
Charmaine Tan
el 26 de Nov. de 2018
Hi, after finding my topkwords (most frequent words), how can I plot a histogram of these?
2 comentarios
Christopher Creutzig
el 26 de Nov. de 2018
txt = extractFileText('sonnets.txt');
td = tokenizedDocument(lower(txt));
td = erasePunctuation(td);
bow = bagOfWords(td);
top = topkwords(bow,20);
bar(top.Count)
set(gca,'XTick',1:size(top,1),'XTickLabel',top.Word,'XTickLabelRotation',45)
(In general, it's a good idea not to ask a new question as an “answer,” but to open a new question instead. It helps other people searching MATLAB Answers in the future.)
Ver también
Categorías
Más información sobre Language Support en Help Center y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!