文章基本信息

标题：Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method
本地全文：下载
作者：Viny Christanti Mawardi ; Niko Susanto ; Dali Santun Naga 等
期刊名称：MATEC Web of Conferences
电子版ISSN：2261-236X
出版年度：2018
卷号：164
DOI：10.1051/matecconf/201816401047
语种：English
出版社：EDP Sciences
摘要：Any mistake in writing of a document will cause the information to be told falsely. These days, most of the document is written with a computer. For that reason, spelling correction is needed to solve any writing mistakes. This design process discuss about the making of spelling correction for document text in Indonesian language with document's text as its input and a .txt file as its output. For the realization, 5 000 news articles have been used as training data. Methods used includes Finite State Automata (FSA), Levenshtein distance, and N-gram. The results of this designing process are shown by perplexity evaluation, correction hit rate and false positive rate. Perplexity with the smallest value is a unigram with value 1.14. On the other hand, the highest percentage of correction hit rate is bigram and trigram with value 71.20 %, but bigram is superior in processing time average which is 01:21.23 min. The false positive rate of unigram, bigram, and trigram has the same percentage which is 4.15 %. Due to the disadvantages at using FSA method, modification is done and produce bigram's correction hit rate as high as 85.44 %.
关键词：enFinite state automataLevenshtein distanceN-gramSpelling correction