期刊名称:International Journal of Data Mining & Knowledge Management Process
印刷版ISSN:2231-007X
电子版ISSN:2230-9608
出版年度:2011
卷号:1
期号:3
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Text summarization is the most challenging task in information retrieval tasks. It is an outcome of electronic document explosion and can be seen as the condensation of the document collection. The use of text summarization allows a user to get a sense of the content of full-text, or to know its information content without reading all sentences within the full-text. Data reduction helps user to find the required information quickly without having to waste time in reading the whole text. We present a query based document summarizer based on similarity of sentences and word frequency. We used AQUAINT-2 Information-Retrieval Text Research Collections and the obtained summary sentences are evaluated using ROUGE metrics. The summarizer does not use any expensive linguistic data. Our Summarizer uses Vector Space Model for finding similar sentences to the query and Sum Focus to find word frequency, we achieved high Recall and Precision scores. The accuracy achieved using the proposed method is comparable to the best systems presented in recent academic competitions i.e., TAC (Text Analysis Conference).
关键词:Summarization; Sentence Similarity; Word Frequency; Query-based summarization.