期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2013
卷号:10
期号:2
出版社:IJCSI Press
摘要:The machine translation domain has been developed and improved very quickly. But the issue of long sentences is still a problem in this domain. Hence using phrase chunking on the purpose of reducing the length of sentences to improve the translation quality is a promising approach. In this paper, we present the approach of lexical analysis - phrase chunking - applied to French sentences in combination with a French-Vietnamese bilingual dictionary. And we also define the boundaries of the chunks to create a set of French-Vietnamese bilingual segments in order to overcome limitations due to the long sentences. We tested the system model with a French-Vietnamese bilingual corpus composed of 10,000 sentences pairs and evaluated on a sample of 100 sentences pairs in this corpus after the chunking process by our system. And our system has been evaluated with an accuracy more than 90%, and the value of F-measure is 91.61%.
关键词:Bilingual corpus; machine translation; extraction of parallel corpus; chunk alignment; French Tree Bank corpus; Conditional Random Fields.