出版社:Centro Latinoamericano de Estudios en Informática
摘要:Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in
Text Categorisation to improve the accuracy while benefiting from learning
algorithms that only support two classes. An accurate ensemble relies on the
quality of its corresponding decomposition matrix, which at the same time
depends on the separation between the categories and the diversity of the
dichotomies representing the binary classifiers. Important open questions
include finding a good definition for diversity between two dichotomies and a
way of combining all the pairwise diversity values into a single indicator that
we call the decomposition quality. In this work we introduce a new measure to
estimate the diversity between two learners and we compare it to the well-known
Hamming distance. We also examine three functions to evaluate the decomposition
quality. We present a set of experiments where these measures and functions are
tested using two distinct document corpora with several configurations in each.
The analysis of the results shows a weak relationship between the ensemble
accuracy and its diversity.