- Process large text files to find unique words and their frequencies.
- Visually represent those word frequencies, there are thousands of unique words.
Large amount of text frequency representation visually
7 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I am working on text mining. Now i have some text files which contains millions of words. So i want to determine thier words frequncies. I have two probelms
- how to process large data in matlab for unique words findings and thier occurance for any text document(contains words in millions)
- after finding unique words and thier occurance how to represent them in circos/pi etc any graphical representation (as unique words can be in thousands)
0 comentarios
Respuestas (1)
Samayochita
el 18 de Jun. de 2025
Hi moin khan,
I understand that while working on large-scale text mining in MATLAB, the goal is to:
To efficiently process large text data in MATLAB:
Step 1: Read large files
Use memory-efficient reading using fileread or fopen and fscanf.
textData = fileread('largeTextFile.txt'); % Suitable for moderately large files
For very large files, prefer reading in chunks:
fid = fopen('largeTextFile.txt','r');
while ~feof(fid)
line = fgetl(fid);
% process line
end
fclose(fid);
Step 2: Tokenize text and clean it (optional but preferred)
Break the text into words, convert to lowercase, remove punctuation, etc.
cleanedText = lower(regexprep(textData, '[^\w\s]', '')); % remove punctuation
words = split(cleanedText); % tokenize
words = words(~cellfun('isempty',words)); % remove empty strings
Step 3: Count word frequencies
Use “unique” and “accumarray” functions OR “tabulate” function.
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
[uniqueWords, ~, idx] = unique(words);
counts = accumarray(idx, 1);
OR
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
T = tabulate(words)
Step 4: Visualize word frequencies using word cloud
Ideal to create a word cloud chart for hundreds or thousands of words.
wordcloud(uniqueWords, counts);
Please refer to the following documentation links for more information:
Hope this is helpful!
0 comentarios
Ver también
Categorías
Más información sobre Graph and Network Algorithms en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!