文章基本信息

标题：esTenTen, a Vast Web Corpus of Peninsular and American Spanish
本地全文：下载
作者：Adam Kilgarriff ; Adam Kilgarriff ; Irene Renau 等
期刊名称：Procedia - Social and Behavioral Sciences
印刷版ISSN：1877-0428
出版年度：2013
卷号：95
页码：12-19
DOI：10.1016/j.sbspro.2013.10.617
语种：English
出版社：Elsevier
摘要：AbstractEveryone working on general language would like their corpus to be bigger, wider-coverage, cleaner, duplicate-free, and with richer metadata. As a response to that wish, Lexical Computing Ltd. has a programme to develop very large ‘TenTen’ web corpora. In this paper we introduce the Spanish corpus, esTenTen, of 8 billion words and 19 different national varieties of Spanish. We investigate the distance between the national varieties as represented in the corpus, and examine in detail the keywords of Peninsular Spanish vs. American Spanish, finding a wide range of linguistic, cultural and political contrasts.
关键词：corpus linguistics;Sketch Engine;Spanish dialects;TenTen corpora