Traversing Text Document Matlab

2 visualizaciones (últimos 30 días)
xRobot
xRobot el 17 de Nov. de 2019
Editada: Adam Danz el 19 de Nov. de 2019
Please provide guidance on this particular inquiry. All responses are highly valued and will be used to further knowledge(not just looking for a copy and paste solution). I am attempting to read a Microsoft Word dictionary into Matlab. From here I would like to be able to traverse it and extract words of a specific length, say four letter words, and put them into an array. Then I would like to select random words from the array and put them into a matrix. ?

Respuestas (1)

Adam Danz
Adam Danz el 17 de Nov. de 2019
Editada: Adam Danz el 17 de Nov. de 2019
Reading from word doc
Here's the general approach to reading a Microsoft word document.
directory = 'C:\Users\AOC\Documents\MATLAB';
file = 'myDocFile.docx';
% Full path to the MS Word file
filePath = fullfile(directory,file);
% Read MS Word file using actxserver function
word = actxserver('Word.Application');
wdoc = word.Documents.Open(filePath);
txt = wdoc.Content.Text;
Quit(word)
delete(word)
The variable txt is a char array containing the text in your document.
Extracting 4-letter words
There are several approaches you could use. This one is fast and doesn't require segementing each word and counting each word-length. Instead, it uses a regular expression to search for this pattern:
[non-letter],[4-letters],[non-letter]
It also uses strtrim() to remove the leading and trailing white space.
% Extract 4-letter words.
s = strtrim(regexp(txt, '([^a-zA-Z])[a-zA-Z]{4}([^a-zA-Z])', 'match'));
s is a 1xn cell array of 4-letter words at character arrays.
Randomly select words
You can't put non-numeric values into a matrix but you can put them into a cell array. This example below chooses n random values from the extracted words.
n = 10;
if n > numel(s)
error('There are only %d words available. You selected %d words.' numel(s), n)
end
randIdx = randi(numel(s),1,n);
randWords = s(randIDx); % Here is your random selection
  5 comentarios
xRobot
xRobot el 19 de Nov. de 2019
fileID = fopen('mylist.odt','r');
formatSpec = '%s';
words = fscanf(fileID,formatSpec);
I have used the above code to read in the file. It read in as a 1x11102 char. What I would like to do is convert this to a string array.
Adam Danz
Adam Danz el 19 de Nov. de 2019
Editada: Adam Danz el 19 de Nov. de 2019

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by