期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2017
卷号:17
期号:8
页码:90-97
出版社:International Journal of Computer Science and Network Security
摘要:Stemming is a heuristic process to chop off end part of words and sometimes adding additional letters at the end of words to get the basic meaningful forms of surface words. The basic goal of stemming is to reduce inflectional forms of words to root words using multiple techniques. In this paper, hybrid approaches are used for stemming Punjabi words. There has not been any stemmer reported for Punjabi ??? ???? (Shahmukhi) script. We used database lookup approach and rule based stemming for Punjabi Stemmer. Our dataset consists of 2.5 million tokens which were divided into three parts of 1500000, 500000 and 500000 tokens and used for training, development and testing purpose respectively. We got 86.01% accuracy while tested our stemmer over above specified dataset by using 63 rules.
关键词:Rule based stemmer; morphology; lookup approach; root words; hybrid stemmer; affixes and normalization.