出版社:Information and Media Technologies Editorial Board
摘要:We propose a query transitive translation system of a CLIR (Cross Language Information Retrieval) for a source language with a poor data resource. Our research aim is to do the transitive translation with a minimum data resource of the source language (Indonesian) and exploit the data resource of the target language (Japanese). We did two kinds of translation, a pure transitive translation and a combination of direct and transitive translations. In the transitive translation, English is used as the pivot language. The translation consists of two main steps. The first is a keyword translation process which attempts to make a translation based on available resources. The keyword translation process involves many target language resources such as the Japanese proper name dictionary and English-Japanese (pivot-target language) bilingual dictionary. The second step is a process to select some of the best available translations. We combined the mutual information score (computed from target language corpus) and TF × IDF score in order to select the best translation. The result on NTCIR 3 (NII-NACSIS Test Collection for IR Systems) Web Retrieval Task showed that the translation method achieved a higher IR score than the machine translation (using Kataku (Indonesian-English) and Babelfish/Excite (English-Japanese) engines). The transitive translation achieved about 38% of the monolingual retrieval, and the combination of direct and transitive translation achieved about 49% of the monolingual retrieval which is comparable to the English-Japanese IR task.
关键词:Transitive Translation;Bilingual Dictionary;Limited Resource Language;Cross Language Information Retrieval;Indonesian-Japanese