Count words for word cloud creation


T = wordCloudCounts(str)



T = wordCloudCounts(str) tokenizes and preprocesses the text in str for word cloud creation and returns a table T of words and frequency counts.


collapse all

Extract the text from sonnets.txt using extractFileText.

str = extractFileText("sonnets.txt");

View the first sonnet.

i = strfind(str,"I");
ii = strfind(str,"II");
start = i(1);
fin = ii(1);
ans = 
       From fairest creatures we desire increase,
       That thereby beauty's rose might never die,
       But as the riper should by time decease,
       His tender heir might bear his memory:
       But thou, contracted to thine own bright eyes,
       Feed'st thy light's flame with self-substantial fuel,
       Making a famine where abundance lies,
       Thy self thy foe, to thy sweet self too cruel:
       Thou that art now the world's fresh ornament,
       And only herald to the gaudy spring,
       Within thine own bud buriest thy content,
       And tender churl mak'st waste in niggarding:
         Pity the world, or else this glutton be,
         To eat the world's due, by the grave and thee.

Tokenize and preprocess the sonnets text and create a table of word frequency counts.

T = wordCloudCounts(str);
ans=8×2 table
     Word     Count
    ______    _____

    "thy"      281 
    "thou"     235 
    "love"     188 
    "thee"     162 
    "eyes"      90 
    "doth"      88 
    "make"      63 
    "mine"      63 

Input Arguments

collapse all

Input text, specified as a string array, character vector, or cell array of character vectors.

Example: ["An example of a short sentence."; "A second short sentence."]

Data Types: string | char | cell

Output Arguments

collapse all

Table of words counts sorted in order of importance. The table has columns:

WordString scalar of the word.
CountThe number of times the word appears in the documents. The function groups the counts of words that differ only by case or have a common stem according to normalizeWords. For example, the function groups the counts for "walk", "Walking", "walking", and "walks".

Introduced in R2017b