期刊名称:International Journal of Education and Management Engineering(IJEME)
印刷版ISSN:2305-3623
电子版ISSN:2305-8463
出版年度:2017
卷号:7
期号:5
页码:1-6
DOI:10.5815/ijeme.2017.05.01
出版社:MECS Publisher
摘要:This paper introduces a simple and fast lossless compression algorithm, called CAD, for the compression of protein sequences. The proposed algorithm is specially suited for compressing proteomes, which are the collection of all proteins expressed by an organism. Maintaining a changing dictionary of actively used amino-acid residues, the algorithm uses the adaptive dictionary together with Huffman coding to achieve an average compression rate of 3.25 bits per symbol, better than most other existing protein-compression and general-purpose compression algorithms known to us. With an average compression ratio of 2.46:1 and an average compression rate of 1.32M residues/sec, our algorithm outperforms every other compression algorithm for compressing protein sequences in terms of the balance in compression-time and compression rate.
关键词:Protein sequence compression;dictionary based compression;huffman encoding