How to deal with imbalanced dataset classification by support vector machine

41 visualizaciones (últimos 30 días)
I have a dataset that is heavily skewed in one class. The training with support vector machine (SVM), by either fitcsvm.m or fitcecoc.m, cannot give desirable results. The accuracy for the class that has more samples is more than 90%, but for the class with much fewer samples is barely 70%. Is there any way to improve the training by SVM? or other methods that can be used to tackle the umbablanced data training?

Respuesta aceptada

Aditya Mittal
Aditya Mittal el 21 de Abr. de 2020
Hi,
There are some ways which can be used to balance the dataset before fitting to the classifier to get the better result. These methods are as follows:
  • Under Sampling- Removing the unwanted or repeated data from the majority class and keep only a part of these useful points. In this way, there can be some balance in the data.
  • Over Sampling- Try to get more data points for the minority class. Or try to replicate some of the data points of the minority class in order to increase cardinality.
  • Generate Data- You can decide to generate synthetic data for the minority class for balancing the data. This can be done using SMOTE method. Below is the link to use SMOTE method-
  • https://www.mathworks.com/matlabcentral/fileexchange/38830-smote-synthetic-minority-over-sampling-technique
The results vary according to the problem. And accuracy is not always the best performance matric when evaluating imbalanced data. Therefore you should try different performance metrics which can give better insight.
  • Confusion matrix
  • Precision
  • Recall
  • F1 score
Try fitting the data to various machine learning models like hybrid or ensemble machine learning algorithms (e.g. Adaboost), or deep learning models can be used in order to receive better results.
  4 comentarios
Kenta
Kenta el 11 de Jul. de 2020
The answer from Dr. Aditya Mittal is very informative. The example of oversampling is posted here. I hope it helps you.
Esmeralda Ruiz Pujadas
Esmeralda Ruiz Pujadas el 22 de Mzo. de 2023
You cannot use those methods directly, you are touching the validation. And SVM is different than deep learning. You cannot especify directly the validation in svm....

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Statistics and Machine Learning Toolbox en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by