摘要:Support Vector Machine (SVM) is one of the most popular and effective classification algorithms and has attracted much attention in recent years. As an important large margin classifier, SVM dedicates to find the optimal separating hyperplane between two classes, thus can give outstanding generalization ability for it. In order to find the optimal hyperplane, we commonly take most of the labeled records as our training set. However, the separating hyperplane is only determined by a few crucial samples (Support Vectors, SVs), we needn’t train SVM model on the whole training set. This paper presents a novel approach based on clustering algorithm, in which only a small subset was selected from the original training set to act as our final training set. Our algorithm works to select the most informative samples using K-means clustering algorithm, and the SVM classifier is built through training on those selected samples. Experiments show that our approach greatly reduces the scale of training set, thus effectively saves the training and predicting time of SVM, and at the same time guarantees the generalization performance.