Categorical Data preprocessing for Data mining

2 visualizaciones (últimos 30 días)
Samuel Katongole
Samuel Katongole el 6 de Oct. de 2021
Editada: Samuel Katongole el 6 de Oct. de 2021
Hello friends
I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.
Oh, thanks
I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.
For instance, when trying to correct those for world bank I tried this expression which is still failing
pat11='wo(rd|rdl|uld|rld)?\s((b\w*|nk|divisio)$)?[^vd]';
newDataClean.installer=regexprep(newDataClean.installer,pat11,'world bank');
Here i was testing the expression in Atom, but it fails to correctly replace those selected words
However, I am still wondering if there could be another "faster" way of approaching the issue!
  1 comentario
KSSV
KSSV el 6 de Oct. de 2021
Question is not clear. Can you elaborate with an example?

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Introduction to Installation and Licensing en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by