文章基本信息

标题：Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish
本地全文：下载
作者：Veronika Laippala ; Veronika Laippala ; Jenna Kanerva 等
期刊名称：Procedia - Social and Behavioral Sciences
印刷版ISSN：1877-0428
出版年度：2015
卷号：198
页码：233-241
DOI：10.1016/j.sbspro.2015.07.441
语种：English
出版社：Elsevier
摘要：AbstractThis article studies syntactic ngrams, i.e. little subtrees of dependency syntax analyses, as keystructures reflecting syntactic characteristics of corpora. While traditional keywords correspond to statistically more or less frequent words of a corpus and are often informative on the corpus topic and style, unlexicalized syntactic ngrams applied in this study extend the level of description beyond individual words to sequences of syntactic elements. The article analyzes the utility of these sequences in corpus description and gives first results on the structural characteristics reflected by them in the studied texts, including Finnish literature, Internet forum discussions from the major Finnish social networking website and Internet discussions following the news and editorials of the major Finnish newspaper's website. The syntactic ngrams are produced with the freely available Finnish Dependency Parser and Ngram Builder and the keystructures analyzed with a linear classifier. The results suggest that syntactic ngrams illustrate both topical features, such as names and Internet urls discussed in the corpora, as well as structural characteristics, such as subject-verb combinations, negations and informal sentence structures, thus both generalizing the information given by traditional keywords from individual words to concepts and providing new knowledge about typical constructions not reached by lexemes.
关键词：Keyness;syntactic ngrams;computer-mediated communication