期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2010
卷号:2010
出版社:ACL Anthology
摘要:A central problem in historical linguistics
is the identification of historically related
cognate words. We present a generative
phylogenetic model for automatically inducing
cognate group structure from unaligned
word lists. Our model represents
the process of transformation and transmission
from ancestor word to daughter
word, as well as the alignment between
the words lists of the observed languages.
We also present a novel method for simplifying
complex weighted automata created
during inference to counteract the
otherwise exponential growth of message
sizes. On the task of identifying cognates
in a dataset of Romance words, our model
significantly outperforms a baseline approach,
increasing accuracy by as much as
80%. Finally, we demonstrate that our automatically
induced groups can be used to
successfully reconstruct ancestral words.