期刊名称:International Journal of Software Engineering and Its Applications
印刷版ISSN:1738-9984
出版年度:2016
卷号:10
期号:4
页码:1-12
DOI:10.14257/ijseia.2016.10.4.01
出版社:SERSC
摘要:Diagnosis of cancer with biomarkers is relatively simple with the use of blood samples, and it can detect cancer at an early stage with expense compared to the other diagnosis methods. We use word embedding to find an alternative biomarker for the early diagnosis of ovarian cancer from the biomedical corpus. Word embedding is a word vector representation with previously proven efficiency in the biomedical domain. First, we derived a low dimensional representation of each biomarker embedding induced from Canonical Correlation Analysis (CCA), which is a powerful and flexible statistical technique for dimensionality reduction. Second, we found a similar pair of biomarkers in the literature by using cosine similarity of biomarker embedding. In order to determine the clinical similarity between the pair of biomarkers, we used the area under the curve (AUC) of the combination of 2 biomarkers used previously. In the experiment, we confirmed that correlation between the high similarity biomarker pair, was highly correlated as the average 0.710 of the actual AUC correlation of the top 10% of the pair.