文章基本信息

标题：Análisis del tamaño y especificidad de los corpus en la evaluación de resúmenes mediante el LSA: Un análisis comparativo entre LSA y jueces expertos
本地全文：下载
作者：Ricardo Olmos ; José Antonio León ; Inmaculada Escudero 等
期刊名称：Revista Signos
印刷版ISSN：0035-0451
电子版ISSN：0718-0934
出版年度：2009
卷号：42
期号：69
页码：71-81
语种：Spanish
出版社：Pontificia Universidad Católica de Valparaíso
摘要：El Análisis Semántico Latente (LSA) es una sofisticada herramienta computacional de análisis semántico capaz de obtener una representación matemática del significado de las palabras o textos. LSA, entre otras aplicaciones, ha demostrado ser eficiente en la evaluación de textos. Esta herramienta adquiere la representación matemática de los textos analizando previamente un corpus lingüístico compuesto por documentos digitalizados. El principal objetivo de este estudio fue analizar qué propiedades han de tener distintos corpus lingüísticos (general, condensado, diversificado, y corpus de base) para que las evaluaciones de los resúmenes efectuadas por el LSA se parezcan lo máximo posible a las realizadas por 4 jueces humanos. Dichos resúmenes fueron elaborados por 390 estudiantes de Primaria, ESO y universitarios españoles. Los resultados indicaron que el tamaño de los corpus no tiene por qué ser tan generales ni tan grandes como los que se utilizan en Boulder (compuesto por millones de textos y más de un millón de palabras), ni tampoco demasiado específicos (menos de 300 textos y 5000 palabras) para que la evaluación que se desee hacer de ellos resulte satisfactoriamente eficiente.
其他摘要：Latent Semantic Analysis (LSA) is an automatic statistical method for representing the meanings of words and text passages. An emerging body of evidence supports the reliability of LSA as a tool for assessing the semantic similarities between units of discourse. LSA has also proved to be comparable to human judgments of similarities in documents. Before analyzing a linguistic corpus composed by digitized documents, this tool acquires the mathematical representation of the texts. The main objective of this study was to analyze what properties (general, condensed, diversified, and base corpus) different linguistic corpora should have so that the assessment of the summaries carried out by the LSA is as similar as possible to the assessment made by four human raters. Three hundred and ninety Spanish middle and high school students (14-16 years old) and undergraduate students read a narrative text and later summarized it. Findings indicate that the size of the corpora need not be as general and as big as those used in Boulder (made up by millions of texts and over one million words), nor do they have to be too specific (fewer than 300 texts and 5000 words) for the assessment to be satisfactorily efficient.
关键词：Análisis Semántico Latente (LSA); resúmenes; evaluación del discurso; corpus lingüístico; estudiantes universitarios;Latent Semantic Analysis (LSA); summary; discourse assessment; linguistic corpus; university students