首页    期刊浏览 2025年02月22日 星期六
登录注册

文章基本信息

  • 标题:Text Coherence Analysis based on Misspelling Oblivious Word Embeddings and Deep Neural Network
  • 本地全文:下载
  • 作者:Anwar Hussen Wadud ; Rashadul Hasan Rakib
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2021
  • 卷号:12
  • 期号:1
  • 页码:194-203
  • DOI:10.14569/IJACSA.2021.0120124
  • 出版社:Science and Information Society (SAI)
  • 摘要:Text coherence analysis is the most challenging task in Natural Language Processing (NLP) than other subfields of NLP, such as text generation, translation, or text summarization. There are many text coherence methods in NLP, most of them are graph-based or entity-based text coherence methods for short text documents. However, for long text documents, the existing methods perform low accuracy results which is the biggest challenge in text coherence analysis in both English and Bengali. This is because existing methods do not consider misspelled words in a sentence and cannot accurately assess text coherence. In this paper, a text coherence analysis method has been proposed based on the Misspelling Oblivious Word Embedding Model (MOEM) and deep neural network. The MOEM model replaces all misspelled words with the correct words and captures the interaction between different sentences by calculating their matches using word embedding. Then, the deep neural network architecture is used to train and test the model. This study examines two different types of datasets, one in Bengali and the other in English, to analyze text consistency based on sentence sequence activities and to evaluate the effectiveness of this model. In the Bengali language dataset, 7121 Bengali text documents have been used where 5696 (80%) documents have been used for training and 1425 (20%) documents for testing. And in the English language dataset, 6000 (80%) documents have been used for training and 1500 (20%) documents for model evaluation out of 7500 text documents. The efficiency of the proposed model is compared with existing text coherence analysis techniques. Experimental results show that the proposed model significantly improves automatic text coherence detection with 98.1% accuracy in English and 89.67% accuracy in Bengali. Finally, comparisons with other existing text coherence models of the proposed model are shown for both English and Bengali datasets.
  • 关键词:Coherence analysis; deep neural network; distributional representation; misspellings; NLP; word embedding
国家哲学社会科学文献中心版权所有