期刊名称:International Journal of Modern Education and Computer Science
印刷版ISSN:2075-0161
电子版ISSN:2075-017X
出版年度:2019
卷号:11
期号:4
页码:23-31
DOI:10.5815/ijmecs.2019.04.03
出版社:MECS Publisher
摘要:There is a growing body of attention to importance of document summarization in most NLP tasks. So far, full coverage information, coherence of output sentences and lack of similar sentences (non-redundancy) are the main challenges faced to many experiments in compacted summaries. Although some research has been carried out on compact summaries, there have been few empirical investigations into coherence of output sentences. The aim of this essay is to explore a comprehensive and useful methodology to generate coherent summaries. The methodological approach taken in this study is a mixed method based on most likely n-grams and word2vec algorithm to convert separated sentences into numeric and normalized matrices. This paper attempts to extract statistical properties from numeric matrices. Using a greedy approach, the most relevant sentences to main document subject are selected and placed in the output summary. The proposed greedy method is our backbone algorithm, which utilizes a repeatable algorithm, maximizes two features of conceptual coherence and subject matter diversity in the summary. Suggested approach compares its result to similar model Q_Network and shows the superiority of its algorithm in confronting with long text document.
关键词:Natural language processing;Extractive summarization;Text coherence;Word vector;Language models