摘要:Over the past decades, the number of historical corpora available has steadily
grown. Perhaps the best-known and most widely used is the Helsinki Corpus.
(See Kytö 1996[1991] for a description of the corpus and Rissanen et al. 1993
for a range of possible applications.) Other historical corpora include ARCHER
(A Representative Corpus of Historical English Registers), the Corpus of Early
English Correspondence (CEEC), the Innsbruck Computer Archive of Machine-
Readable English Texts (ICAMET), the Lampeter Corpus of Early Modern
English Tracts, and the Zurich English Newspaper Corpus (ZEN), to name just a
few (cf. Biber et al. 1994; Fries 1994; Schmied 1994; Keränen 1998; Markus
1999a). However, given their relatively small size, these historical corpora are
unfortunately only of limited value for the study of less frequent features of the
English language. The Helsinki Corpus, for instance, spans almost a thousand
years (ca. 750 to 1700) but contains only 1.57 million words. Even for the period
of Late Modern English, suitable corpus data is not in great abundance. For
example, although ARCHER covers a smaller time-span from 1650 to 1990 and
offers detailed categorization by register, its overall size of less than two million
words still results in many of the same limitations as the Helsinki Corpus.1