首页    期刊浏览 2024年12月14日 星期六
登录注册

文章基本信息

  • 标题:Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information
  • 本地全文:下载
  • 作者:Sonja Nießen ; Hermann Ney
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2004
  • 卷号:30
  • 期号:2
  • 页码:181-204
  • DOI:10.1162/089120104323093285
  • 语种:English
  • 出版社:MIT Press
  • 摘要:In statistical machine translation, correspondences between the words in the source and the target language are learned from parallel corpora, and often little or no linguistic knowledge is used to structure the underlying models. In particular, existing statistical systems for machine translation often treat different inflected forms of the same lemma as if they were independent of one another. The bilingual training data can be better exploited by explicitly taking into account the interdependencies of related inflected forms. We propose the construction of hierarchical lexicon models on the basis of equivalence classes of words. In addition, we introduce sentence-level restructuring transformations which aim at the assimilation of word order in related sentences. We have systematically investigated the amount of bilingual training data required to maintain an acceptable quality of machine translation. The combination of the suggested methods for improving translation quality in frameworks with scarce resources has been successfully tested: We were able to reduce the amount of bilingual training data to less than 10% of the original corpus, while losing only 1.6% in translation quality. The improvement of the translation results is demonstrated on two German-English corpora taken from the Verbmobil task and the Nespole! task.
国家哲学社会科学文献中心版权所有