摘要:his article describes ongoing research in the RSNSR1
(Regelbasierte Suche in
Textdatenbanken mit nichtstandardisierter Rechtschreibung, “Rule-based search in text
databases with nonstandard orthography”) project. The focus of this project is making
historical text documents digitally available; consequently, it examines the challenges
for digitization procedures and subsequent retrieval operations, like fuzzy full-text
search. Difficulties are posed by scans of low quality facsimiles, old font types,
inconsistent transcriptions and especially typical optical character recognition (OCR)
errors and spelling variation. This article discusses recent solutions to such problems,
concentrating on stochastic string edit distance measures, so-called evidences and the
avoidance of static dictionaries. By presenting visualization approaches for retrieval in
and browsing of historical databases and nonstandard text documents, as well as a
prototype for visual evaluation of distance measures, it proposes a progression of
information visualization in linguistics