期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2020
卷号:11
期号:10
DOI:10.14569/IJACSA.2020.0111065
出版社:Science and Information Society (SAI)
摘要:DNA sequencing has recently generated a very large volume of data in digital format. These data can be compressed, processed and classified only by using automatic tools which have been employed in biological experiments. In this work, we are interested in the classification of particular regions in C. Elegans Genome, a recently described group of transposable elements (TE) called Miniature Inverted-repeat Transposable Elements (MITEs). We particularly focus on the four MITE families (Cele1, Cele2, Cele14, and Cele42). These elements have distinct chromosomal distribution patterns and specific number conserved on the six autosomes of C. Elegans. Thus, it is necessary to define specific chromosomal domains and the potential relationship between MITEs and Tc / mariner elements, which makes it difficult to determine the similarities between MITES and TC classes. To solve this problem and more precisely to identify these TEs, these data are classified and compressed, in this study, using an efficient classifier model. The application of this model consists of four steps. First, the DNA sequence are mapped in a scalogram’s form. Second, the characteristic motifs are extracted in order to obtain a genomic signature. Third, MITE database is randomly divided into two data sets: 70% for training and 30%for tests. Finally, these scalograms are classified using Transfer Learning Approach based on pre-trained models like VGGNet. The introduced model is efficient as it achieved the highest accuracy rates thanks to the recognition of the correct characteristic patterns and the overall accuracy rate reached 97.11% for these TEs samples classification. Our approach allowed also classifying and identifying the MITES Classes compared to the TC class despite their strong similarity. By extracting the features and the characteristic patterns, the volume of massive data was considerably reduced.
关键词:DNA scalograms; genomic signature; classification; deep learning; transfer learning; VGGNET; accuracy