Remove numbers during preprocessing

6 visualizaciones (últimos 30 días)
Rachele Franceschini
Rachele Franceschini el 30 de Sept. de 2022
Comentada: Ergin Sezgin el 30 de Sept. de 2022
I would like to remove numbers within text. I have this function or script for the preprocessing, how I can remove all numbers?
%Create Co-occurence Network for only class1 and 0 5%
data = dataone.text;
%textdata = data.text;
data = randsample(data,100)
%data=data(1:100,1)
documents = preprocessText(data);
bag = bagOfWords(documents);
bag1 = removeInfrequentWords(bag,2);
counts = bag1.Counts;
cooccurrence = counts.'*counts;
G = graph(cooccurrence,bag1.Vocabulary,'omitselfloops');

Respuestas (1)

Ergin Sezgin
Ergin Sezgin el 30 de Sept. de 2022
Hello Rachele,
Try using the following code with your string array.
words = ["stringOne", "stringTwo", "2022", "stringThree"]
words = 1×4 string array
"stringOne" "stringTwo" "2022" "stringThree"
doubleArray = str2double(words)
doubleArray = 1×4
NaN NaN 2022 NaN
nanIdx = isnan(doubleArray)
nanIdx = 1×4 logical array
1 1 0 1
wordsArray = words(1,nanIdx)
wordsArray = 1×3 string array
"stringOne" "stringTwo" "stringThree"
Good luck
  2 comentarios
Rachele Franceschini
Rachele Franceschini el 30 de Sept. de 2022
Thank you for your suggest, but I am not sure. I corrected my question, maybe the preprocessing function it was not clear. I tried your script, but I have had some problem of error ('The logical indices in position 2 contain a true value outside of the array bounds.') on second-to-last.
Ergin Sezgin
Ergin Sezgin el 30 de Sept. de 2022
If the issue is with a char array, its possible to remove all numbers from it, checking each element by an explicit loop or vectorization. If there are multiple char elements in a container, same method should also work after some additional steps are added. Could you please share some of the data?

Iniciar sesión para comentar.

Categorías

Más información sobre Cell Arrays en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by