期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2019
卷号:10
期号:8
页码:108-112
出版社:Science and Information Society (SAI)
摘要:Sentiment analysis in non-English language can be
more challenging than the English language because of the
scarcity of publicly available resources to build the prediction
model with high accuracy. To alleviate this under-resourced
problem, this article introduces the leverage of byte-level
recurrent neural model to generate text representation for
twitter sentiment analysis in the Indonesian language. As the
main part of the proposed model training is unsupervised and
does not require much-labeled data, this approach can be
scalable by using a huge amount of unlabeled data that is easily
gathered on the Internet, without much dependencies on humangenerated
resources. This paper also introduces an Indonesian
dataset for general sentiment analysis. It consists of 10,806
twitter data (tweets) selected from a total of 454,559 gathered
tweets which taken directly from twitter using twitter API. The
10,806 tweets are then classified into 3 categories, positive,
negative, and neutral. This Indonesian dataset could help the
development of Indonesian sentiment analysis especially general
sentiment analysis and encouraged others to start publishing
similar dataset in the future.
关键词:Sentiment analysis; under-resourced problem;
Indonesian dataset; twitter