首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method
  • 本地全文:下载
  • 作者:Viny Christanti Mawardi ; Niko Susanto ; Dali Santun Naga
  • 期刊名称:MATEC Web of Conferences
  • 电子版ISSN:2261-236X
  • 出版年度:2018
  • 卷号:164
  • DOI:10.1051/matecconf/201816401047
  • 语种:English
  • 出版社:EDP Sciences
  • 摘要:Any mistake in writing of a document will cause the information to be told falsely. These days, most of the document is written with a computer. For that reason, spelling correction is needed to solve any writing mistakes. This design process discuss about the making of spelling correction for document text in Indonesian language with document's text as its input and a .txt file as its output. For the realization, 5 000 news articles have been used as training data. Methods used includes Finite State Automata (FSA), Levenshtein distance, and N-gram. The results of this designing process are shown by perplexity evaluation, correction hit rate and false positive rate. Perplexity with the smallest value is a unigram with value 1.14. On the other hand, the highest percentage of correction hit rate is bigram and trigram with value 71.20 %, but bigram is superior in processing time average which is 01:21.23 min. The false positive rate of unigram, bigram, and trigram has the same percentage which is 4.15 %. Due to the disadvantages at using FSA method, modification is done and produce bigram's correction hit rate as high as 85.44 %.
  • 关键词:enFinite state automataLevenshtein distanceN-gramSpelling correction
国家哲学社会科学文献中心版权所有