出版社:Information and Media Technologies Editorial Board
摘要:Question classification, an important phase in question answering systems, is the task of identifying the type of a given question among a set of predefined types. This study uses unlabeled questions in combination with labeled questions for semi-supervised learning, to improve the precision of question classification task. For semi-supervised algorithm, we selected Tri-training because it is a simple but efficient co-training style algorithm. However, Tri-training is not well suitable for question data, so we give two proposals to modify Tri-training, to make it more suitable. In order to enable its three classifiers to have different initial hypotheses, Tri-training bootstrap-samples the originally labeled set to get different sets for training the three classifiers. The precisions of three classifiers are decreased because of the bootstrap-sampling. With the purpose to avoid this drawback by allowing each classifier to be initially trained on the originally labeled set while still ensuring the diversity of three classifiers, our first proposal is to use multiple algorithms for classifiers in Tri-training; the second proposal is to use multiple algorithms for classifiers in combination with multiple views, and our experiments show promising results.