首页    期刊浏览 2024年12月14日 星期六
登录注册

文章基本信息

  • 标题:SMAD: Text Classification of Arabic Social Media Dataset for News Sources
  • 本地全文:下载
  • 作者:Amira M. Gaber ; Mohamed Nour El-din ; Hanan Moussa
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2021
  • 卷号:12
  • 期号:10
  • DOI:10.14569/IJACSA.2021.0121058
  • 语种:English
  • 出版社:Science and Information Society (SAI)
  • 摘要:Due to the advances in technology, social media has become the most popular means for the propagation of news. Many news items are published on social media like Facebook, Twitter, Instagram, etc. but are not categorized into various different domains, such as politics, education, finance, art, sports, and health. Thus, text classification is needed to classify the news into different domains to reduce the huge amount of news available over social media, reduce time and effort for recognizing the category or domain, and present data to improve the searching process. Most existing datasets don’t follow pre-processing and filtering processes and aren’t organized based on classification standards to be ready for use. Thus, the Arabic Natural Processing Language (ANLP) phases will be used to pre-process, normalize, and categorize the news into the right domain. This paper proposes an Arabic Social Media Dataset (SMAD) for text classification purposes over the social media using ANLP steps. The SMAD dataset consists of 15,240 Arabic news items categorized over the Facebook social network. The experimental results illustrate that the SMAD corpus gives accuracy of about 98% in five domains (Art, Education, Health, Politics, and Sport). The SMAD dataset has been trained tested and is ready for use.
  • 关键词:Text classification; Arabic text classification; Arabic Natural Language Processing (ANLP)
国家哲学社会科学文献中心版权所有