摘要:One of the significant challenges in machine learning is the classification of imbalanced data. In many situations, standard classifiers cannot learn how to distinguish minority class examples from the others. Since many real problems are unbalanced, this problem has become very relevant and deeply studied today. This paper presents a new preprocessing method based on Delaunay tessellation and the preprocessing algorithm SMOTE (Synthetic Minority Over-sampling Technique), which we call DTO-SMOTE (Delaunay Tessellation Oversampling SMOTE). DTO-SMOTE constructs a mesh of simplices (in this paper, we use tetrahedrons) for creating synthetic examples. We compare results with five preprocessing algorithms (GEOMETRIC-SMOTE, SVM-SMOTE, SMOTE-BORDERLINE-1, SMOTE-BORDERLINE-2, and SMOTE), eight classification algorithms, and 61 binary-class data sets. For some classifiers, DTO-SMOTE has higher performance than others in terms of Area Under the ROC curve (AUC), Geometric Mean (GEO), and Generalized Index of Balanced Accuracy (IBA).
关键词:machine learning; SMOTE; oversampling; DTO-SMOTE; imbalanced data machine learning ; SMOTE ; oversampling ; DTO-SMOTE ; imbalanced data