首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes
  • 本地全文:下载
  • 作者:Muhidin Mohamed ; Mourad Oussalah
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2014
  • 卷号:5
  • 期号:7
  • DOI:10.14569/IJACSA.2014.050725
  • 出版社:Science and Information Society (SAI)
  • 摘要:An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems.
  • 关键词:thesai; IJACSA; thesai.org; journal; IJACSA papers; named entity identification; Wikipedia infobox; infobox templates; Named Entity Classification (NEC)
国家哲学社会科学文献中心版权所有