Correct Spelling in Documents

This example shows how to correct spelling in documents using Hunspell.

Load Text Data

Create an array of tokenized documents.

str = [
    "Correctly spelled worrds are important for lemmatization."
    "Text Analytics Toolbox providesfunctions for spelling correction."
    "The phrase Higgs boson is a technical term."];
documents = tokenizedDocument(str)

documents = 
  3×1 tokenizedDocument:

    8 tokens: Correctly spelled worrds are important for lemmatization .
    8 tokens: Text Analytics Toolbox providesfunctions for spelling correction .
    9 tokens: The phrase Higgs boson is a technical term .

Correct Spelling

Correct the spelling of the documents using the correctSpelling function.

updatedDocuments = correctSpelling(documents)

updatedDocuments = 
  3×1 tokenizedDocument:

    8 tokens: Correctly spelled words are important for lemmatization .
    9 tokens: Text Analytics Toolbox provides functions for spelling correction .
    9 tokens: The phrase Riggs boson is a technical term .

Notice that:

The input word "worrds" has been changed to "words".
The input word "lemmatization" has been changed to "solemnization".
The input word "providesfunctions" has been split into the two words "provides" and "functions".
The input word "Higgs" has been changed to "Riggs".

Specify Custom Words

To prevent the software from updating particular words, you can provide a list of known words using the KnownWords name-value argument of the correctSpelling function.

Correct the spelling of the documents again and specify the words "lemmatization" and "Higgs" as known words.

updatedDocuments = correctSpelling(documents,KnownWords=["lemmatization","Higgs"])

updatedDocuments = 
  3×1 tokenizedDocument:

    8 tokens: Correctly spelled words are important for lemmatization .
    9 tokens: Text Analytics Toolbox provides functions for spelling correction .
    9 tokens: The phrase Higgs boson is a technical term .

Notice here that the words "lemmatization" and "Higgs" remain unchanged.

Correct Spelling in Documents

Load Text Data

Correct Spelling

Specify Custom Words

See Also

Topics