期刊名称:International Journal of Advances in Soft Computing and Its Applications
印刷版ISSN:2074-8523
出版年度:2019
卷号:11
期号:3
页码:108-123
出版社:International Center for Scientific Research and Studies
摘要:Social media as a means of communicating in cyberspace continues to grow both from the number of users, utilization, and the resulting impact. Existing social media ecosystems are influenced by the influence of public figures, trending topics, even spam, and spammers. Detection of spam accounts that have been done mostly using the method of classification or supervised learning. This will be a problem if the data is new and the supervised model is not updated it will increase the possibility of false detection. Based on the problem, this study will use Principal Component Analysis (PCA) and K-means clustering with Mahalanobis distance as a method to detect a collection of users who have similar properties to determine spam. This study uses 150 thousand twitter data with 15 thousand account data that described as graph data. The result, we find that error detection in the classification method to find spam is a class that made only two: spam and non-spam. Though in addition there are still other classes that have the characteristics of spam when it is not. In this paper, we defined the clusters on to 5 clusters: normal, news account and public activist, foreign account, public figure, and spam.
关键词:K-means; Principal Component Analysis (PCA); Social Media; Social Network Analysis; Spam.