首页    期刊浏览 2024年12月12日 星期四
登录注册

文章基本信息

  • 标题:Atar: Attention-based LSTM for Arabizi transliteration
  • 本地全文:下载
  • 作者:Bashar Talafha ; Analle Abuammar ; Mahmoud Al-Ayyoub
  • 期刊名称:International Journal of Electrical and Computer Engineering
  • 电子版ISSN:2088-8708
  • 出版年度:2021
  • 卷号:11
  • 期号:3
  • 页码:2327
  • DOI:10.11591/ijece.v11i3.pp2327-2334
  • 出版社:Institute of Advanced Engineering and Science (IAES)
  • 摘要:A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).
国家哲学社会科学文献中心版权所有