期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2022
卷号:13
期号:2
DOI:10.14569/IJACSA.2022.0130283
语种:English
出版社:Science and Information Society (SAI)
摘要:Textual entailment is a relationship between two text fragments, namely, text/premise and hypothesis. It has applications in question answering systems, multi-document sum-marization, information retrieval systems, and social network analysis. In the era of the digital world, recognizing semantic variability is important in understanding inferences in texts. The texts are either in the form of sentences, posts, tweets, or user experiences. Hence understanding inferences from customer experiences helps companies in customer segmentation. The availability of digital information is ever-growing with textual data in almost all languages, including low resource languages. This work deals with various machine learning approaches applied to textual entailment recognition or natural language inference for Malayalam, a South Indian low resource language. A performance-based analysis using machine learning classification techniques such as Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, AdaBoost, and Naive Bayes is done for the MaNLI (Malayalam Natural Language Inference) dataset. Different lexical and surface-level features are used for this binary and multiclass classification. With the increasing size of the dataset, there is a drop in the performance of feature-based classification. A comparison of feature-based models with deep learning approaches highlights this inference. The main focus here is the feature-based analysis with 14 different features and its comparison, essential to any NLP classification problem.
关键词:Textual entailment; natural language inference; Malayalam language; machine learning; deep learning