期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2012
卷号:2012
出版社:ACL Anthology
摘要:Variants of Naive Bayes (NB) and Support
Vector Machines (SVM) are often used as
baseline methods for text classification, but
their performance varies greatly depending on
the model variant, features used and task/
dataset. We show that: (i) the inclusion of
word bigram features gives consistent gains on
sentiment analysis tasks; (ii) for short snippet
sentiment tasks, NB actually does better than
SVMs (while for longer documents the opposite
result holds); (iii) a simple but novel SVM
variant using NB log-count ratios as feature
values consistently performs well across tasks
and datasets. Based on these observations, we
identify simple NB and SVM variants which
outperform most published results on sentiment
analysis datasets, sometimes providing
a new state-of-the-art performance level.