首页    期刊浏览 2025年03月01日 星期六
登录注册

文章基本信息

  • 标题:CTSS: A Tool for Efficient Information Extraction with Soft Matching Rules for Text Mining
  • 本地全文:下载
  • 作者:Christy, A. ; Thambidurai, P.
  • 期刊名称:Journal of Computer Science
  • 印刷版ISSN:1549-3636
  • 出版年度:2008
  • 卷号:4
  • 期号:5
  • 页码:375-381
  • DOI:10.3844/jcssp.2008.375.381
  • 出版社:Science Publications
  • 摘要:The abundance of information available digitally in modern world had made a demand for structured information. The problem of text mining which dealt with discovering useful information from unstructured text had attracted the attention of researchers. The role of Information Extraction (IE) software was to identify relevant information from texts, extracting information from a variety of sources and aggregating it to create a single view. Information extraction systems depended on particular corpora and were poor in recall values. Therefore, developing the system as domain-independent as well as improving the recall was an important challenge for IE. In this research, the authors proposed a domain-independent algorithm for information extraction, called SOFTRULEMINING for extracting the aim, methodology and conclusion from technical abstracts. The algorithm was implemented by combining trigram model with softmatching rules. A tool CTSS was constructed using SOFTRULEMINING and was tested with technical abstracts of www.computer.org and www.ansinet.org and found that the tool had improved its recall value and therefore the precision value in comparison with other search engines.
  • 关键词:Parsing; trigram model; soft matching; information extraction; recall; precision
国家哲学社会科学文献中心版权所有