期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2019
卷号:10
期号:8
页码:512-517
出版社:Science and Information Society (SAI)
摘要:In this digital era, the document entries have been
increasing days by days, causing a situation where the volume of
the document entries in overwhelming. This situation has caused
people to encounter with problems such as congestion of data,
difficulty in searching the intended information or even difficulty
in managing the databases, for example, MEDLINE database
which stores the documents related to the biomedical field. This
research will specify the solution focusing in text classification of
the biomedical abstracts. Text classification is the process of
organizing documents into predefined classes. A standard text
classification framework consists of feature extraction, feature
selection and the classification stages. The dataset used in this
research is the Ohsumed dataset which is the subset of the
MEDLINE database. In this research, there is a total number of
11,566 abstracts selected from the Ohsumed dataset. First of all,
feature extraction is performed on the biomedical abstracts and a
list of unique features is produced. All the features in this list will
be added to the multiword tokenizer lexicon for tokenizing
phrases or compound word. After that, the classification of the
biomedical texts is conducted using the deep learning network,
Convolutional Neural Network which is an approach widely used
in many domains such as pattern recognition, classification and
so on. The goal of classification is to accurately organize the data
into the correct predefined classes. The Convolutional Neural
Network has achieved a result of 54.79% average accuracy,
61.00% average precision, 60.00% average recall and 60.50%
average F1-score. In short, it is hoped that this research could be
beneficial to the text classification area.
关键词:Convolutional neural network; biomedical text
classification; compound term; Ohsumed dataset