How to Index Most Common/Popular Str Patterns by Str Length?
Mostrar comentarios más antiguos
In brief, I am attempting to find the most common str patterns across a large str array. Moreover, I wish to identify the most common strings by length. Please see the below example!
Ex.
MyStr = ["hello" "yellow" "teller" "mellow"]
Using MyStr, my desired output is "ellow" using 5 characters; "ello" using 4 characters; "ell" using 3 characters;
Note: "ello" does NOT appear in every word - I am interested only in frequency. If possible, I would prefer to output the 1st, 2nd, 3rd, etc. most popular substrings at each character increment/limit (i.e. 3 character length, 5 character length etc.).
User Paramjeet Panwar suggested the below on a related question, however, histc returns an error: "First Input must be a real non-sparse numeric array."
a = unique(myStr);
n = histc(myStr,a);
[n,idx] = sort(n);
myFreq = a(idx);
2 comentarios
Walter Roberson
el 14 de Oct. de 2020
The shorter substrings will always be more common than the longer ones, unless every occurance of the shorter one is part of the longer one.
Should your code be keeping track of substrings of various lengths, and count only the longest applicable substring ?
signalsandsystemsishard
el 14 de Oct. de 2020
Respuestas (0)
Categorías
Más información sobre Characters and Strings en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!