首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Cocytus : parallel NLP over disparate data
  • 本地全文:下载
  • 作者:Noah Evans ; Masayuki Asahara ; Yuji Matsumoto
  • 期刊名称:Traitement Automatique des Langues
  • 印刷版ISSN:1248-9433
  • 电子版ISSN:1965-0906
  • 出版年度:2008
  • 卷号:49
  • 期号:2
  • 出版社:ATALA - Assoc Traitement Automatique Langues
  • 摘要:As NLP deals with larger datasets and more computationally expensive algorithms, cutting-edge NLP research is increasingly becoming the province of companies like Google who can use an astronomical amount of resources to do NLP tasks. Smaller institutions are being left behind. In addition to this lack of resources, what resources a typical researcher does have access to are represented in a variety of differing, incompatible data formats and operating system semantics. NLP researchers devote a large amount of research time developing NLP tools to support a variety of different data formats, time that could be spent doing productive research. To solve these problems of data representation and processing huge data, this paper presents Cocytus, a platform for creating NLP tools loosely based on Unix, that handles different data formats and parallel computation transparently, thus allowing institutions to make maximum use of their resources.
国家哲学社会科学文献中心版权所有