首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks
  • 本地全文:下载
  • 作者:Ruth O'Donovan ; Michael Burke ; Aoife Cahill
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2005
  • 卷号:31
  • 期号:3
  • 页码:329-366
  • DOI:10.1162/089120105774321073
  • 语种:English
  • 出版社:MIT Press
  • 摘要:We present a methodology for extracting subcategorization frames based on an automatic lexical-functional grammar (LFG) f-structure annotation algorithm for the Penn-II and Penn-III Treebanks. We extract syntactic-function-based subcategorization frames (LFG semantic forms) and traditional CFG category-based subcategorization frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. Our approach associates probabilities with frames conditional on the lemma, distinguishes between active and passive frames, and fully reflects the effects of long-distance dependencies in the source data structures. In contrast to many other approaches, ours does not predefine the subcategorization frame types extracted, learning them instead from the source data. Including particles and prepositions, we extract 21,005 lemma frame types for 4,362 verb lemmas, with a total of 577 frame types and an average of 4.8 frame types per verb. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource. To our knowledge, this is the largest and most complete evaluation of subcategorization frames acquired automatically for English.
国家哲学社会科学文献中心版权所有