Replace text in words of documents using regular expression
Update Text in Words
Replace words that begin with
"e", and have at least one character between them. To match whole words, use
"^" to match the start of a word and
"$" to match the end of the word.
documents = tokenizedDocument([ ... "an example of a short sentence" "a second short sentence"])
documents = 2x1 tokenizedDocument: 6 tokens: an example of a short sentence 4 tokens: a second short sentence
expression = "^s(\w+)e$"; replace = "thing"; newDocuments = regexprep(documents,expression,replace)
newDocuments = 2x1 tokenizedDocument: 6 tokens: an example of a short thing 4 tokens: a second short thing
If you do not use
"$", then you can match substrings of the words. Replace all vowels with "_".
expression = "[aeiou]"; replace = "\_"; newDocuments = regexprep(documents,expression,replace)
newDocuments = 2x1 tokenizedDocument: 6 tokens: _n _x_mpl_ _f _ sh_rt s_nt_nc_ 4 tokens: _ s_c_nd sh_rt s_nt_nc_
Include Captured Tokens in Word Replacement
Replace variations of the word
"walk" by capturing the letters that follow
documents = tokenizedDocument([ "I walk" "they walked" "we are walking"])
documents = 3x1 tokenizedDocument: 2 tokens: I walk 2 tokens: they walked 3 tokens: we are walking
expression = "walk(\w*)"; replace = "ascend$1"; newDocuments = regexprep(documents,expression,replace)
newDocuments = 3x1 tokenizedDocument: 2 tokens: I ascend 2 tokens: they ascended 3 tokens: we are ascending
documents — Input documents
Input documents, specified as a
replace — Replacement text
character vector | cell array of character vectors | string array
Replacement text, specified as a character vector, a cell array of character vectors, or a string array, as follows:
replaceis a single character vector and
expressionis a cell array of character vectors, then
regexprepuses the same replacement text for each expression.
replaceis a cell array of
Ncharacter vectors and
expressionis a single character vector, then
Nmatches and replacements.
expressionare cell arrays of character vectors, then they must contain the same number of elements.
replaceelement with its corresponding element in
The replacement text can include regular characters, special characters (such as tabs or new lines), or replacement operators, as shown in the following tables.
Portion of the input text that is currently a match
Portion of the input text that precedes the current match
Portion of the input text that follows the current match
Output returned when MATLAB executes the command,
Any character with special meaning in regular expressions
that you want to match literally (for example, use