期刊名称:Faculty of Computer Science and Information Technology
出版年度:2009
卷号:0
期号:0
语种:English
出版社:Faculty of Computer Science and Information Technology
摘要:Spam mail classification is used to separate the spam-mails with non-spam mail(or commonly called ham or legitimate mail). Spam mail classification is usefulto save time and cost savings are used to remove spam mail from the inbox. Itrequired the best method to classify spam mails. Decision tree algorithm is onemethod that can be used for spam mail classification. Decision tree algorithmhas a lot of development experience. Algorithm ID3 and C4.5 is one of thedevelopment of decision tree algorithms. This study compares the performanceof these two algorithms. The purpose of this study is to compare theperformance of both algorithms are owned by the algorithm ID3 and C4.5algorithms, in classifying spam mails. From this comparison will know thepercentage of precision, recall and accuracy of data used. The data used is datataken from the UCI Machine Learning Repository amounted to 4601 dataconsists of email spam in 1813 and 2788 non-spam emails. The data retrievedwill be converted into category data with the frequency distribution techniques.By using the tool Weka (Waikato Environment for Knowledge Analysis) 3-4this is used to test the algorithm ID3 and C4.5. Performance measurement ofID3 and C4.5 algorithm has a result that ID3algorithm has a better performancethan the C4.5 algorithm.