How to get probabilities of each class which is classified with RUSBoost for an imbalanced data set
1 visualización (últimos 30 días)
Mostrar comentarios más antiguos
Dilshan Subasinghe
el 24 de Abr. de 2020
Comentada: Louis
el 6 de Nov. de 2023
I have a dataset with 7 classes and 3 features. The data set is hugely imbalanced. So, I referred https://www.mathworks.com/help/stats/classification-with-imbalanced-data.html to classfy the data. I get a prediction accuracy of 94%. But I need the probability of getting each class for a feature or set of features. How to get probability of each class to a given feature?
[Nt Mt] = size(y); % Number of observations in the training sample
t = templateTree('MaxNumSplits',Nt);
rusTree = fitcensemble(X,y,'Method','RUSBoost', 'NumLearningCycles',1000,'Learners',t,'LearnRate',0.1,'nprint',100);
[~,scores] = predict(rusTree,[1 16 3 5])
I get following scores for above code, 0.7345, 3.5105, 1.1893, 0, 0, 0, 0.0082
But above scores are not probablities, how to get values between 0-1 where sum of proabilities in all classes is equal to 1?
0 comentarios
Respuesta aceptada
Raunak Gupta
el 29 de Abr. de 2020
Editada: Raunak Gupta
el 29 de Abr. de 2020
Hi,
The reason behind predict not returning scores as probability estimates is because the ‘RUSBoost’ algorithm used in the model does not treat scores as probabilistic estimates. Instead, the score represents the confidence of a classification into a class, higher, being more confidence as it is explained in the documentation link of fitcensemble .
If you would like to get probabilistic estimate for scores you can set the 'ScoreTransform' to 'logit' in 'fitcensemble'. This name-value pair transforms the score to probabilistic estimates. This is explained here. Then using predict on the model returns scores as probability values for each class.
2 comentarios
Siddharth Arora
el 27 de Feb. de 2022
Hi Raunak,
I have treid the suggested approaches: (1) using Score Transform to logit in fitcenesmble (for a binary classification problem and the scores are still not probabilistic estimates. I have tried specifing 'ScoreTransform' to 'logit' in 'fitcensemble', and also tried Mdl.ScoreTransform = 'logit' before using the 'predict' function, and the scores (any given row) do not add to 1. I have tried 'doublelogit' for Adaboost and that works fine. But not RUSboost. Please let me know how else I could convert scores from RUSboost to probabilistic estimates? Is it right to use scores from RUSboost as inputs for perfcurve to get AUC values, or should the scores be transformed first? Thank you
Louis
el 6 de Nov. de 2023
I am experiencing the exactly same issue as Siddharth Arora as above. Setting "ScoreTransform' to 'logit' ensures that the score outputs are below 1, but score outputs do not sum to 1.
Más respuestas (0)
Ver también
Categorías
Más información sobre Classification Ensembles en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!