首页    期刊浏览 2024年12月12日 星期四
登录注册

文章基本信息

  • 标题:Effective Unsupervised Arabic Word Stemming: Towards an Unsupervised Radicals Extraction
  • 本地全文:下载
  • 作者:Ahmed Khorsi
  • 期刊名称:The International Arab Journal of Information Technology
  • 印刷版ISSN:1683-3198
  • 出版年度:2012
  • 卷号:9
  • 期号:6
  • 出版社:Zarqa Private University
  • 摘要:This paper presents a new totally unsupervised and 90% effective stemming approach for classical Arabic. This stemming is meant to be a preparatory step to an unsupervised root (i.e., radicals) extraction. As a learning input, our stemming system requires no linguistic knowledge but a plain classical Arabic text. Once the learning input analyzed, our stemming system is able to extract the strongest segment of a given length, namely the stem. We start by a definition of the targeted stem, then, we show how our system performs about 90% true positives after a leaning of less than 15000 words. Unlike the other unsupervised approaches, ours does not suppose the perfectness of the input text and deals efficiently with the eventual (practically very frequent) misspellings. The test corpus we have used is an ultimate reference in the classical Arabic and its labeling has been rigorously done by a team of experts.
  • 关键词:Computational morphology; machine learning; natural language processing; classical arabic; and semitic languages.
国家哲学社会科学文献中心版权所有