首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Leveraging of Weighted Ensemble Technique for Identifying Medical Concepts from Clinical Texts at Word and Phrase Level
  • 本地全文:下载
  • 作者:Dipankar Das ; Krishna Sharma
  • 期刊名称:Computer Science & Information Technology
  • 电子版ISSN:2231-5403
  • 出版年度:2021
  • 卷号:11
  • 期号:12
  • 语种:English
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:Concept identification from medical texts becomes important due to digitization. However, it is not always feasible to identify all such medical concepts manually. Thus, in the present attempt, we have applied five machine learning classifiers (Support Vector Machine, K-Nearest Neighbours, Logistic Regression, Random Forest and Naïve Bayes) and one deep learning classifier (Long Short Term Memory) to identify medical concepts by training a total of 27.383K sentences. In addition, we have also developed a rule based phrase identification module to help the existing classifiers for identifying multi- word medical concepts. We have employed word2vec technique for feature extraction and PCA and T- SNE for conducting ablation study over various features to select important ones. Finally, we have adopted two different ensemble approaches, stacking and weighted sum to improve the performance of the individual classifier and significant improvements were observed with respect to each of the classifiers. It has been observed that phrase identification module plays an important role when dealing with individual classifier in identifying higher order ngram medical concepts. Finally, the ensemble approach enhances the results over SVM that was showing initial improvement even after the application of phrase based module.
  • 关键词:Medical Concepts;Phrase Identification;Ensemble;Machine Learning
国家哲学社会科学文献中心版权所有