文章基本信息

标题：Retrieval of Spelling Variants in Nonstandard Texts – Automated Support and Visualization
本地全文：下载
作者：Thomas Pilz ; Wolfram Luther ; Ulrich Ammon 等
期刊名称：SKY Journal of Linguistics
印刷版ISSN：1456-8438
电子版ISSN：1796-279X
出版年度：2008
卷号：21
出版社：The Linguistic Association of Finland
摘要：his article describes ongoing research in the RSNSR1 (Regelbasierte Suche in Textdatenbanken mit nichtstandardisierter Rechtschreibung, “Rule-based search in text databases with nonstandard orthography”) project. The focus of this project is making historical text documents digitally available; consequently, it examines the challenges for digitization procedures and subsequent retrieval operations, like fuzzy full-text search. Difficulties are posed by scans of low quality facsimiles, old font types, inconsistent transcriptions and especially typical optical character recognition (OCR) errors and spelling variation. This article discusses recent solutions to such problems, concentrating on stochastic string edit distance measures, so-called evidences and the avoidance of static dictionaries. By presenting visualization approaches for retrieval in and browsing of historical databases and nonstandard text documents, as well as a prototype for visual evaluation of distance measures, it proposes a progression of information visualization in linguistics