word count matrix problem
Mostrar comentarios más antiguos
Can anyone see how I can correct this code for the wordCount Matrix I am counting the unique words for all the files. I also have a 2469(unique words)*160(reviews) matrix.I have attached a snippet of the matrix for preview.

The problem I am having is that I am completely stuck on how to allocate the word counts relevant to each of the reviews. What is happening though is the total count is appearing in the first column and the rest are nil. I would very much appreciate it if someone could just have a look at my code and see if they can find the problem (probably really stupid error but I just cannot see it and have tried loads of methods to try and solve it but this appears to be the best one so far (for me at least)).
clear all;
% Collects requested files from a specified folder and inserts them into an array
fpath = ('C:\Users\Willem\Documents\MATLAB\fold1');
% Returns an error if folder is not found
if ~isdir(fpath)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', fpath);
uiwait(warndlg(errorMessage));
return;
end
files = dir(fullfile(fpath,'*.oneline'));
nfiles = length(files);
data = {};
docArray = {};
if true
data = [];
% Seperates each files data strings into individual columns within the matrix
for k = 1:nfiles
thisdata = importdata(fullfile(fpath,files(k).name)); % imports the data into the matrix array
nrow = length(thisdata); % extend number of rows if needed
docArray(1:nrow,end+1) = thisdata(:); % displays each review per column
data = [data; importdata(fullfile(fpath,files(k).name))]; % creates single column array of all the words
end
end
uniqueWords = unique(data); % Checks for unique words in all the review strings
% Counts the number of times each unique word appears in each review
wordCount = zeros(numel(uniqueWords),k);
for j = 1:length(uniqueWords)
counter = 0;
for l = 1:length(data)
if isequal(uniqueWords{j},data{l})
counter = counter +1;
end
end
wordCount(j) = counter;
end
7 comentarios
Walter Roberson
el 10 de Nov. de 2013
Why are you using importdata() twice on the same file? The data is already stored in thisdata.
data = [data; thisdata];
Walter Roberson
el 10 de Nov. de 2013
Why did you leave out the code for extending the matrix?
if size(docArray,1) < nrow
docArray(nrow,1) = {}; %extend number of rows if needed
end
Willem
el 10 de Nov. de 2013
Walter Roberson
el 11 de Nov. de 2013
Editada: Walter Roberson
el 11 de Nov. de 2013
Make it
docArray{nrow,1} = '';
Willem
el 11 de Nov. de 2013
Willem
el 11 de Nov. de 2013
Willem
el 12 de Nov. de 2013
Respuestas (3)
Here is an alternate and probably simpler solution (because it's a 1 line solution after you update the call to UNIQUE) for counting occurrences:
>> words = { 'john', 'jim', 'john', 'john', 'james', 'john', 'james' } ;
>> [uniqueWords,~,ic] = unique( words )
uniqueWords =
'james' 'jim' 'john'
ic =
3 2 3 3 1 3 1
>> counts = accumarray( ic.', ones(size(ic)) )
counts =
2
1
4
3 comentarios
Willem
el 11 de Nov. de 2013
Walter Roberson
el 12 de Nov. de 2013
The poster would like to have a per-review count of each unique word.
The adapted code would probably use ismember() on each review (because not every review will have every unique word and order becomes important for the output.)
Willem
el 12 de Nov. de 2013
Walter Roberson
el 10 de Nov. de 2013
Hint:
thisreview = docArray(:,k);
if isequal(uniqueWords{j}, thisreview{L})
5 comentarios
Willem
el 10 de Nov. de 2013
Willem
el 11 de Nov. de 2013
Walter Roberson
el 11 de Nov. de 2013
There are multiple ways to proceed. The way that is closest to how you have set up your code at the moment is to loop over the unique words, and for each of them loop over the reviews, counting the number of times that word occurs in that review, and setting the entry at (word_number, review_number) appropriately.
Willem
el 11 de Nov. de 2013
Willem
el 11 de Nov. de 2013
Willem
el 14 de Nov. de 2013
8 comentarios
Walter Roberson
el 14 de Nov. de 2013
Hint:
[tf, idx] = ismember(CellString1, CellString2);
Then think about how you might use idx to do counting.
Willem
el 14 de Nov. de 2013
Walter Roberson
el 14 de Nov. de 2013
No, I mean ismember(). Look at the documentation for it, and see how it might help you vectorize.
Willem
el 14 de Nov. de 2013
Walter Roberson
el 14 de Nov. de 2013
Suppose you already knew for sure that everything in the first cellstring could be found in the second cellstring ?
Willem
el 14 de Nov. de 2013
Walter Roberson
el 14 de Nov. de 2013
Suppose you switched around the two cellstrings ?
Willem
el 15 de Nov. de 2013
Categorías
Más información sobre Logical en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
