摘要:In this paper we use a distance d between sequences of N-grams to identify N-grams that show a different performance when comparing two sequences of N-grams. With this tool, we inspect written texts of European Portuguese dated between 16th century and 19th century. We identify the most voluble N-grams throughout the period and we also identify N-grams that should be considered when studying the linguistic changes from Classical Portuguese to Modern Portuguese. We find that 2-grams composed by unstressed monosyllables followed by paroxytone words (and viceversa) change markedly, from one text to the next, during the whole period. Stressed monosyllabic words (SMW) reveal discrepancies between written texts of the 16th century when compared with texts from the beginning of the 17th century. 2-grams of (i) SMW followed by paroxytone or oxytone word and (ii) paroxytone dissyllabic word or oxytone word followed by a SMW are some of them.
关键词:And phrases;Bayesian information criterion;Partition Markov models;Proximity between N-grams.