首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Automatic Extraction Of Malay Compound Nouns Using A Hybrid Of Statistical And Machine Learning Methods
  • 其他标题:Automatic Extraction Of Malay Compound Nouns Using A Hybrid Of Statistical And Machine Learning Methods
  • 本地全文:下载
  • 作者:Muneer A. S. Hazaa ; Nazlia Omar ; Fadl Mutaher Ba-Alwi
  • 期刊名称:International Journal of Electrical and Computer Engineering
  • 电子版ISSN:2088-8708
  • 出版年度:2016
  • 卷号:6
  • 期号:3
  • 页码:925-935
  • DOI:10.11591/ijece.v6i3.pp925-935
  • 语种:English
  • 出版社:Institute of Advanced Engineering and Science (IAES)
  • 摘要:Identifying of compound nouns is important for a wide spectrum of applications in the field of natural language processing such as machine translation and information retrieval. Extraction of compound nouns requires deep or shallow syntactic preprocessing tools and large corpora. This paper investigates several methods for extracting Noun compounds from Malay text corpora. First, we present the empirical results of sixteen statistical association measures of Malay compound nouns extraction. Second, we introduce the possibility of integrating multiple association measures. Third, this work also provides a standard dataset intended to provide a common platform for evaluating research on the identification compound Nouns in Malay language. The standard data set contains 7,235 unique N-N candidates, 2,970 of them are N-N compound nouns collocations. The extraction algorithms are evaluated against this reference data set. The experimental results demonstrate that a group of association measures (T-test , Piatersky-Shapiro (PS) , C_value, FGM and rank combination method) are the best association measure and outperforms the other association measures for collocations in the Malay corpus. Finally, we describe several classification methods for combining association measures scores of the basic measures, followed by their evaluation. Evaluation results show that classification algorithms significantly outperform individual association measures. Experimental results obtained are quite satisfactory in terms of the Precision, Recall and F-score.
  • 其他摘要:Identifying of compound nouns is important for a wide spectrum of applications in the field of natural language processing such as machine translation and information retrieval. Extraction of compound nouns requires deep or shallow syntactic preprocessing tools and large corpora. This paper investigates several methods for extracting Noun compounds from Malay text corpora. First, we present the empirical results of sixteen statistical association measures of Malay compound nouns extraction. Second, we introduce the possibility of integrating multiple association measures. Third, this work also provides a standard dataset intended to provide a common platform for evaluating research on the identification compound Nouns in Malay language. The standard data set contains 7,235 unique N-N candidates, 2,970 of them are N-N compound nouns collocations. The extraction algorithms are evaluated against this reference data set. The experimental results demonstrate that a group of association measures (T-test , Piatersky-Shapiro (PS) , C_value, FGM and rank combination method) are the best association measure and outperforms the other association measures for collocations in the Malay corpus. Finally, we describe several classification methods for combining association measures scores of the basic measures, followed by their evaluation. Evaluation results show that classification algorithms significantly outperform individual association measures. Experimental results obtained are quite satisfactory in terms of the Precision, Recall and F-score.
  • 关键词:COMPUTER SCIENCE; TEXT MINING; NATURAL LANGAUGE PROCESSING;Malay Compound Nouns ;Statistical Methods ;Machine Learning Methods
国家哲学社会科学文献中心版权所有