文章基本信息

标题：Cocytus : parallel NLP over disparate data
本地全文：下载
作者：Noah Evans ; Masayuki Asahara ; Yuji Matsumoto 等
期刊名称：Traitement Automatique des Langues
印刷版ISSN：1248-9433
电子版ISSN：1965-0906
出版年度：2008
卷号：49
期号：2
出版社：ATALA - Assoc Traitement Automatique Langues
摘要：As NLP deals with larger datasets and more computationally expensive algorithms, cutting-edge NLP research is increasingly becoming the province of companies like Google who can use an astronomical amount of resources to do NLP tasks. Smaller institutions are being left behind. In addition to this lack of resources, what resources a typical researcher does have access to are represented in a variety of differing, incompatible data formats and operating system semantics. NLP researchers devote a large amount of research time developing NLP tools to support a variety of different data formats, time that could be spent doing productive research. To solve these problems of data representation and processing huge data, this paper presents Cocytus, a platform for creating NLP tools loosely based on Unix, that handles different data formats and parallel computation transparently, thus allowing institutions to make maximum use of their resources.