期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:11
DOI:10.14569/IJACSA.2021.0121115
语种:English
出版社:Science and Information Society (SAI)
摘要:DNA (Deoxyribonucleic acid) profiling involves analysis of sequences of individual or mixed DNA profiles to identify persons these profiles belong to. DNA profiling is used in important applications such as for paternity tests, in forensic science for person identification on a crime scheme, etc. Finding the number of contributors in a DNA mixture is a major task in DNA profiling with challenges caused due to allele dropout, stutter, blobs, and noise. The existing methods for finding the number of unknowns in a DNA mixture suffer from issues including computational complexity and accuracy of estimating the number of unknowns. Machine learning has received attention recently in this area but with limited success. Many more efforts are needed for improving the robustness and accuracy of these methods. Our research aims to advance the state-of-the-art in this area. Specifically, in this paper, we investigate the performance of six machine learning algorithms -- Nearest Neighbors (KNN), Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Stochastic Gradient Descent (SGD), and Gaussian Naïve-Bayes (GNB) -- applied to a publicly available dataset called PROVEDIt, containing mixtures with up to five contributors. We evaluate the algorithmic performance using confusion matrices and four performance metrics namely accuracy, F1-Score, Recall, and Precision. The results show that LR provides the highest Accuracy of 95% for mixtures with five contributors.
关键词:Machine learning; DNA profiling; DNA mixtures; forensic science