首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:A French corpus annotated for multiword expressions and named entities
  • 本地全文:下载
  • 作者:Marie Candito ; Mathieu Constant ; Carlos Ramisch
  • 期刊名称:Journal of Language Modelling
  • 印刷版ISSN:2299-856X
  • 电子版ISSN:2299-8470
  • 出版年度:2020
  • 卷号:8
  • 期号:2
  • 页码:415-479
  • DOI:10.15398/jlm.v8i2.265
  • 语种:English
  • 出版社:Polish Academy of Sciences
  • 摘要:We present the enrichment of a French treebank of various genres with a new annotation layer for multiword expressions (MWEs) and named entities (NEs).1 Our contribution with respect to previous work on NE and MWE annotation is the particular care taken to use formal criteria, organized into decision flowcharts, shedding some light on the interactions between NEs and MWEs. Moreover, in order to cope with the well-known difculty to draw a clear-cut frontier between compositional expressions and MWEs, we chose to use sufcient criteria only. As a result, annotated MWEs satisfy a varying number of sufcient criteria, accounting for the scalar nature of the MWE status. In addition to the span of the elements, annotation includes the subcategory of NEs (e.g., person, location) and one matching sufcient criterion for non-verbal MWEs (e.g., lexical substitution). The 3,099 sentences of the treebank were double-annotated and adjudicated, and we paid attention to cross-type consistency and compatibility with the syntactic layer. Overall inter-annotator agreement on non-verbal MWEs and NEs reached 71.1%. The released corpus contains 3,112 annotated NEs and 3,440 MWEs, and is distributed under an open license.
  • 关键词:multiword expressions;annotation;corpus;French
国家哲学社会科学文献中心版权所有