期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2020
卷号:98
期号:16
页码:3233-3244
出版社:Journal of Theoretical and Applied
摘要:In this paper, an abstractive Arabic text summarization model is proposed, which is based on sequence-to-sequence recurrent neural network encoder decoder architecture. The proposed model consists of two layers of hidden states at the encoder and one layer of hidden states at the decoder. The encoder and decoder layers use long short-term memory. The two layers of the encoder are the input text layer and the name entities layer. The inputs for the input text layer are the word embedding of the input text words, while the inputs for the name entity layer are the word embedding of the input text name entities. In all layers, the word embedding that is used is one of the AraVec pre-trained word embedding models. Furthermore, global attention mechanism is used by the decoder to generate the summary words. Special dataset is collected and used for training and evaluating the abstractive summarization model. Moreover, the proposed model is evaluated using ROUGE1 and ROUGE1-NOORDER evaluation measures. The experimental results show that, the proposed model provides good results in terms of ROUGE1 and ROUGE1-NOORDER where the values are 38.4 and 46.4 respectively. Finally, a comparison is made between the word2Vec and dependency parsing based word2Vec word embedding models. The abstractive summarization models that use dependency based word2Vec model outperformed the models that use the original word2Vec model. As a result, the quality of the word embedding highly affects the quality of the generated summary.
关键词:Deep Learning;Abstractive text summarization;Recurrent Neural Network;Attention Mechanism;LSTM;ROUGE.