首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:An Approach of Chunk Alignment for French-Vietnamese Bilingual Corpora
  • 本地全文:下载
  • 作者:Ngoc Tan Le ; Ngoc Tien Le ; Dien Dinh
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2013
  • 卷号:10
  • 期号:2
  • 出版社:IJCSI Press
  • 摘要:The machine translation domain has been developed and improved very quickly. But the issue of long sentences is still a problem in this domain. Hence using phrase chunking on the purpose of reducing the length of sentences to improve the translation quality is a promising approach. In this paper, we present the approach of lexical analysis - phrase chunking - applied to French sentences in combination with a French-Vietnamese bilingual dictionary. And we also define the boundaries of the chunks to create a set of French-Vietnamese bilingual segments in order to overcome limitations due to the long sentences. We tested the system model with a French-Vietnamese bilingual corpus composed of 10,000 sentences pairs and evaluated on a sample of 100 sentences pairs in this corpus after the chunking process by our system. And our system has been evaluated with an accuracy more than 90%, and the value of F-measure is 91.61%.
  • 关键词:Bilingual corpus; machine translation; extraction of parallel corpus; chunk alignment; French Tree Bank corpus; Conditional Random Fields.
国家哲学社会科学文献中心版权所有