首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:GKEEP: An Enhanced Graph‐Based Keyword Extractor With Error‐Feedback Propagation for Geoscience Reports
  • 本地全文:下载
  • 作者:Qinjun Qiu ; Zhong Xie ; Hong Xie
  • 期刊名称:Earth and Space Science
  • 电子版ISSN:2333-5084
  • 出版年度:2021
  • 卷号:8
  • 期号:5
  • 页码:e2020EA001602
  • DOI:10.1029/2020EA001602
  • 出版社:John Wiley & Sons, Ltd.
  • 摘要:As the amount of published geoscience literature grows, reading and summarizing texts of large collections has become a challenging task. Publication keywords can be considered basic components of knowledge structure representations and have been used to reveal knowledge concerning research domains. In contrast to data used in other research domains, the works on textual geoscience data that entail keyword extraction are limited. In this paper, we propose an unsupervised algorithm, the graph‐based keyword extractor with error‐feedback propagation (GKEEP), that enhances graph‐based keyword extraction approaches by using an error‐feedback mechanism similar to the concept of backpropagation. The proposed approach comprises the following steps. A preprocessed document is used as the input of the proposed model and is represented as a weighted undirected graph, where the vertices represent words and the edges represent the cooccurrence relationship between the words constrained by a window size. Subsequently, its nodes are ranked by their importance scores calculated by a graph‐based ranking algorithm. Consequently, all the words have their own scores, and they are used to compute the scores of keyword candidates. Subsequently, the Word2Vec method is applied to recalculate the scores of keyword candidates and rank the keyword candidates to select the final keyword. It also utilizes error feedback to boost the rankings of the most salient terms that would otherwise be deemed less important. With empirical experiments on two real data sets (including our newly built data set), the proposed GKEEP model outperforms state‐of‐the‐art unsupervised models and the existing graph‐based ranking models. The proposed method can effectively reflect intrinsic keyword semantics and interrelationships. Plain Language Abstract The common or frequently used terms receive higher scores in traditional graph‐based extraction owing to there are more edges connected to them. This paper proposes a graph‐based KE algorithm called KE using error‐feedback propagation, which utilizes the semantics of word embedding to assist in extracting keywords from geoscience reports. We hope that our approach will serve as an alternative method that deserves further study.
  • 关键词:backpropagation;error feedback;geoscience reports;keyword extraction;TextRank;Word2Vec
国家哲学社会科学文献中心版权所有