首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:PHMM: Stemming on Persian Texts using Statistical Stemmer Based on Hidden Markov ModelPHMM: Stemming on Persian Texts using Statistical Stemmer Based on Hidden Markov Model
  • 本地全文:下载
  • 作者:fatemeh momenipour ; mohammadreza keyvanpour
  • 期刊名称:International Journal of Information Science and Management (IJISM)
  • 印刷版ISSN:2008-8302
  • 电子版ISSN:2008-8310
  • 出版年度:2016
  • 卷号:14
  • 期号:2
  • 语种:English
  • 出版社:REGIONAL INFORMATION CENTER FOR SCIENCE AND TECHNOLOGY
  • 其他摘要:Stemming is the process of finding the main morpheme of a word andit is used in natural language processing, text mining and informationretrieval systems. A stemmer extracts the stem of the words. We can classifyPersian stemmers in to three main classes: structural stemmers, dictionarybased stemmers and statistical stemmers.The precision of structural stemmers is low and the expenses of dictionary basedstemmers is high, so the main goal of this research is to design and implementa statistical stemmer based on hidden markov model with high precision which can reduce the sizeof indexed file and increase the speedof information retrieval systems. Our proposed stemmer, finds the prefixes and suffixes of a word and removethem, so the rest of the word is the stem. But there are some exceptions inPersian words which lead to stem those words by mistakes. So we collect a dictionaryof Persian stemmers. Our proposed stemmers, search a word in the dictionary, if it is not there , itfinds the stem of it by hmm based stemmer. This stemmer is tested in Bijankhancorpus and Hamshahri test collection. The results show increment in meanaverage precision and recall. The speed of the Information retrieval system isincreased and the size of indexed filesis decreased by the algorithm.
国家哲学社会科学文献中心版权所有