期刊名称:IAENG International Journal of Computer Science
印刷版ISSN:1819-656X
电子版ISSN:1819-9224
出版年度:2019
卷号:46
期号:2
页码:178-191
出版社:IAENG - International Association of Engineers
摘要:At present, more and more crimes are handled bye-mail. The offender’s email often contains traces and evidenceof the criminal process. Although it is usually very short, itcontains obvious evidence of the criminal process. Therefore,how to use it to be reliable evidence and to identify authors is anurgent problem. In this paper, based on reasonable hypothesis,we try to establish a mathematical model to successfully solvethis problem by using the combination of analytic hierarchyprocess (AHP), the SVM intelligent classification model, and thestatistical analysis. According to the extracted feature of textuallanguage, we filter out the message set and some representativesamples through MySQL. By analyzing the text, we draw fiverepresentative features (i.e., word frequency, syntax structure,sentence length, format, and punctuation), which can be used tomake up the linear space vector set. We use the improved termfrequency–inverse document frequency (TF-IDF) algorithm tocalculate the weight of each word and use AHP to re-weight thefive elements. Moreover, the space vector model is used to obtainthe feature vector of each message. In order to solve the problemof classification model, we use the previously obtained vector setas experimental samples. Then, the multi-class support vectormachine (SVM) is used as the final classification model, and thecross-validation is used to determine the model parameters. Byrandomly partitioning dataset, 80% is used as training set and20% is used as test set. Finally, experimental results show thatthe accuracy is more than 95%.
关键词:e-mail author identification; support vector;machine; term frequency;inverse document frequency; analytic;hierarchy process