File Exchange

image thumbnail

Oversampling Imbalanced Data: SMOTE related algorithms

version 1.0.1 (5.24 MB) by michio
This entry provides MATLAB Implementation of SMOTE related algorithms

33 Downloads

Updated 23 Apr 2020

GitHub view license on GitHub

This entry provides the overview and their implementation of SMOTE and its relative algorithms.

- SMOTE (Chawla, NV. et al. 2002)[1]
- Borderline SMOTE (Han, H. et al. 2005)[2]
- ADASYN (He, H. et al. 2008)[3]
- Safe-level SMOTE (Bunkhumpornpat, C. at al. 2009)[4]

[1]: Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.

[2]: Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878-887). Springer, Berlin, Heidelberg.

[3]: He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (pp. 1322-1328). IEEE.

[4]: Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Pacific-Asia conference on knowledge discovery and data mining (pp. 475-482). Springer, Berlin, Heidelberg.

Cite As

michio (2020). Oversampling Imbalanced Data: SMOTE related algorithms (https://github.com/minoue-xx/Oversampling-Imbalanced-Data/releases/tag/1.0.1), GitHub. Retrieved .

Comments and Ratings (3)

Hey Michio,
thanks for the implementation. Would you mind to correct your code regarding the issue Ryan S mentioned? Thank you :)!

Ryan S

Great implementation michio, thank you for providing this. One thing I noticed is for the mySafeLevelSMOTE function, I think it produces slightly too many synthetic data points and might require 2nd instance of "if index > num2add then break" as the first break gets you out of the kk=1:T2 but a second break is required to get you out of the ii=1:T1. Otherwise, it seems to continue generating synthetic points until T1 is satisfied. One caveat though, I have modified this code to make it compatible with MATLAB R2017b so perhaps your original implementation is fine if running in MATLAB 2019 or later.

Updates

1.0.1

See release notes for this release on GitHub: https://github.com/minoue-xx/Oversampling-Imbalanced-Data/releases/tag/1.0.1

MATLAB Release Compatibility
Created with R2019b
Compatible with any release
Platform Compatibility
Windows macOS Linux