首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:A Hybrid Stemmer of Punjabi Shahmukhi Script
  • 本地全文:下载
  • 作者:Abdul Mateen ; M. Kamran Malik ; Zubair Nawaz
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2017
  • 卷号:17
  • 期号:8
  • 页码:90-97
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:Stemming is a heuristic process to chop off end part of words and sometimes adding additional letters at the end of words to get the basic meaningful forms of surface words. The basic goal of stemming is to reduce inflectional forms of words to root words using multiple techniques. In this paper, hybrid approaches are used for stemming Punjabi words. There has not been any stemmer reported for Punjabi ??? ???? (Shahmukhi) script. We used database lookup approach and rule based stemming for Punjabi Stemmer. Our dataset consists of 2.5 million tokens which were divided into three parts of 1500000, 500000 and 500000 tokens and used for training, development and testing purpose respectively. We got 86.01% accuracy while tested our stemmer over above specified dataset by using 63 rules.
  • 关键词:Rule based stemmer; morphology; lookup approach; root words; hybrid stemmer; affixes and normalization.
国家哲学社会科学文献中心版权所有