首页    期刊浏览 2025年03月01日 星期六
登录注册

文章基本信息

  • 标题:Dialect Identification in NuancedArabic Tweets Using Farasa Segmentation andAraBERT
  • 本地全文:下载
  • 作者:Anshul Wadhawan
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:291-295
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:This paper presents our approach to address the EACL WANLP-2021 Shared Task 1: Nuanced Arabic Dialect Identification (NADI). The task is aimed at developing a system that identifies the geographical location(country/province) from where an Arabic tweet in the form of modern standard Arabic or dialect comes from. We solve the task in two parts. The first part involves pre-processing the provided dataset by cleaning, adding and segmenting various parts of the text. This is followed by carrying out experiments with different versions of two Transformer based models, AraBERT and AraELECTRA. Our final approach achieved macro F1-scores of 0.216, 0.235, 0.054, and 0.043 in the four subtasks, and we were ranked second in MSA identification subtasks and fourth in DA identification subtasks.
国家哲学社会科学文献中心版权所有