期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:11
DOI:10.14569/IJACSA.2021.0121133
语种:English
出版社:Science and Information Society (SAI)
摘要:Muslims are required to conduct Takhrij to validate the truth of Hadith text, especially when it is obtained from online media. Typically, the traditional Takhrij processes are conducted by experts and apply to Arabic Hadith text. This study introduces a contextual similarity model based on BERT Embedding to handle Takhrij on Indonesian Hadith Text. This study examines the effectiveness of BERT Fine-Tuning on the six pre-trained models to produce embedding models. The result shows that BERT Fine-Tuning improves the embedding model average accuracy by 47.67%, with a mean of 0.956845. The most high-grade accuracy was the BERT embedding built based on the indobenchmark/indobert-large-p2 pre-trained model on 1.00. In addition, the manual evaluation achieved 91.67% accuracy.
关键词:Hadith text; Takhrij; natural language processing; text-similarity; word embedding; BERT fine-tuning