摘要:Two main problems that most Data Mining systems face with are high volume of training data and uncertainty in information. To solve these problems, the methods of discretization are used. The discretization of data is very useful for the automatic production of the numeric data conceptual hierarchy. In this paper, InfoGainAttributeEval , GainRatioAttributeEval and ChiSquaredAttributeEval Algorithms have been used for feature selection; and the methods of discretization based on entropy, frequency, and frequency square root have also been applied as well as Naïve Bayes Algorithm which has been used for classification. From the view points of Accuracy, Precision, and Recall criterions, the results show that applied discretization methods have been more efficient than the condition without discretization; and among discretization methods used in this research, entropy-based method has had better results from the view point of compared criterions.
关键词:Data discretization; Naïve Bayes; Feature selection; KDD 99 CUP