Large amount of text frequency representation visually

7 visualizaciones (últimos 30 días)
moin khan
moin khan el 23 de Mzo. de 2021
Respondida: Samayochita el 18 de Jun. de 2025
I am working on text mining. Now i have some text files which contains millions of words. So i want to determine thier words frequncies. I have two probelms
  1. how to process large data in matlab for unique words findings and thier occurance for any text document(contains words in millions)
  2. after finding unique words and thier occurance how to represent them in circos/pi etc any graphical representation (as unique words can be in thousands)

Respuestas (1)

Samayochita
Samayochita el 18 de Jun. de 2025
Hi moin khan,
I understand that while working on large-scale text mining in MATLAB, the goal is to:
  1. Process large text files to find unique words and their frequencies.
  2. Visually represent those word frequencies, there are thousands of unique words.
To efficiently process large text data in MATLAB:
Step 1: Read large files
Use memory-efficient reading using fileread or fopen and fscanf.
textData = fileread('largeTextFile.txt'); % Suitable for moderately large files
For very large files, prefer reading in chunks:
fid = fopen('largeTextFile.txt','r');
while ~feof(fid)
line = fgetl(fid);
% process line
end
fclose(fid);
Step 2: Tokenize text and clean it (optional but preferred)
Break the text into words, convert to lowercase, remove punctuation, etc.
cleanedText = lower(regexprep(textData, '[^\w\s]', '')); % remove punctuation
words = split(cleanedText); % tokenize
words = words(~cellfun('isempty',words)); % remove empty strings
Step 3: Count word frequencies
Use “unique” and “accumarray” functions OR “tabulate” function.
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
[uniqueWords, ~, idx] = unique(words);
counts = accumarray(idx, 1);
OR
words = {'cat', 'dog', 'cat', 'bird', 'dog', 'cat'};
T = tabulate(words)
Step 4: Visualize word frequencies using word cloud
Ideal to create a word cloud chart for hundreds or thousands of words.
wordcloud(uniqueWords, counts);
Please refer to the following documentation links for more information:
Hope this is helpful!

Categorías

Más información sobre Graph and Network Algorithms en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by