摘要:Support vector machine (SVM) can transforms the classification problem into quadratic programming problem, optimizing the classification hyper-plane. But when it deals with large amount of data, there are too much characteristics, which will lead to sample conflict and increase the complexity of SVM classifier. In order to improve the support vector machine processing speed and performance of large data, this paper proposes the relative attribute reduction and an improved heuristic value reduction scheme. Combining Rough Set (RS) theory, this scheme eliminates the redundant attributes of the sample data and values , and adopts a statistics rough set algorithm with relaxation factor to generate decision-making rules, thus the rules are more concise and reliable. In addition, this paper verifies the prominent role of CLBT-SVM (binary tree SVM based on cluster) algorithm to improve the model over fitting. Based on CLBT-SVM and RS, hybrid algorithm RS-CLBT-SVM is proposed in this paper as well. The experimental result shows that the complexity of RS-CLBT-SVM algorithm is greatly reduced. Compared with the traditional SVM algorithm, its training time is shorter and classification speed is faster, which ensures the accuracy of classification
关键词:Rough Set;SVM;Attribute Reduction;Value Reduction;Text Classification;Binary Tree SVM