文章基本信息

标题：COMPARATIVE ANALYSIS OF SUPERVISED AND UNSUPERVISED LEARNING ALGORITHMS FOR ONLINE USER CONTENT SUICIDAL IDEATION DETECTION
本地全文：下载
作者：SERGAZY NARYNOV ; DANIYAR MUKHTARKHANULY ; ILMURAT KERIMOV 等
期刊名称：Journal of Theoretical and Applied Information Technology
印刷版ISSN：1992-8645
电子版ISSN：1817-3195
出版年度：2019
卷号：97
期号：22
页码：3304-3317
出版社：Journal of Theoretical and Applied
摘要：Suicide is one of the leading causes of death in most countries around the world; it is one of the three most common causes of death in a group of young people (15-24 years old), but so far no methods have been developed for diagnosing suicidal tendencies. In this connection, the problem of developing methods for identifying people prone to suicidal behavior is becoming especially topical. One of the directions of such research is the search for typological features of the speech related to suicide using the methods of mathematical linguistics, automatic text processing and machine learning. In foreign science, the texts of people that were motivated by suicide (mainly suicide notes) are studied using methods of automatic text processing (natural language processing), machine learning methods, and models that are constructed to allow to classify whether the text is related to suicide or not. It seems obvious that in order to develop methods for identifying people who are prone to suicide, it is necessary to analyze not only suicide notes (which are usually texts of small volume), but also other texts created by people who have committed suicide. The purpose of this work is to build a model of machine learning, apply teaching methods with and without a teacher, then select the most efficient algorithm for the task to classify whether the text is connected to suicide using comparative analysis. Our research contributes to detection of depressive content that can cause suicide, and to help such people reach confident help from psychologists of national suicide preventing center in Kazakhstan. Obtaining highest result for 95% of f1-score for Random Forest (Supervised) with tf-idf vectorization model, in conclusion we may say that K-means (Unsupervised) using tf-idf shows impressive results, which is only 4% lower in f1-score and precision.
关键词：Random Forest; Sentiment Analysis; K;means; Machine Learning; Suicidal Ideation Detection