首页    期刊浏览 2024年12月11日 星期三
登录注册

文章基本信息

  • 标题:Chinese Named Entity Recognition in the Geoscience Domain Based on BERT
  • 本地全文:下载
  • 作者:Xia Lv ; Zhong Xie ; Dexin Xu
  • 期刊名称:Earth and Space Science
  • 电子版ISSN:2333-5084
  • 出版年度:2022
  • 卷号:9
  • 期号:3
  • 页码:n/a-n/a
  • DOI:10.1029/2021EA002166
  • 语种:English
  • 出版社:John Wiley & Sons, Ltd.
  • 摘要:Abstract Geological reports are frequently used by geologists involved in geological surveys and scientific research to record the results and outcomes of geological surveys. With such a rich data source, a substantial amount of knowledge has yet to be mined and analyzed. This paper focuses on automatically information extraction from geological reports, namely, geological named entity recognition. Geological named entity recognition has an important role in data mining, knowledge discovery and Knowledge graph construction. Existing general named entity recognition models/tools are limited in the domain of geoscience due to the various language irregularities associated with geological text, such as informal sentence structures, several domain‐geoscience words, large character lengths and multiple combinations of independent words. We present Bidirectional encoder representations from transformers (BERT)‐(Bidirectional gated recurrent unit network) BiGRU‐ (Conditional random field) CRF, which is a deep learning‐based geological named entity recognition model that is designed specifically with these linguistic irregularities in mind. Based on the pretrained language model, an integrated deep learning model incorporating BERT, BiGRU and CRF is constructed to obtain character vectors rich in semantic information through the BERT pretrained language model to alleviate for the lack of specificity of static word vectors (e.g., word2vec) and to improve the extraction capability of complex geological entities. We demonstrate our proposed model by applying it to four test datasets, including a geoscience NER data set from regional geological reports, and by comparing its performance with those of five baseline models.
国家哲学社会科学文献中心版权所有