出版社:International Institute for Science, Technology Education
摘要:Text compression methods where the original texts are directly mapped into binary domain are attractive to compress English text files. This paper proposes an intermediate mapping scheme in which the original English text is transformed firstly to decimal domain and then to binary domain. Each two-decimal-digit value in the resulting intermediate decimal file represents the index to the location of each alphabet found in the original text. If the already indexed alphabet is seen again, it will be replaced by the previously given decimal-index number. The decimal file is converted into binary domain by assigning each decimal digit a 4-bit weighted code in according to its frequency of occurrence that is akin to BCD code. The assigned codes aim at generating an equivalent binary file with entropy as close as much to that of the original one. Thereafter, any conventional compression algorithm such as Lempel-Ziv algorithms can be applied to the generated binary file. The obtained compression ratios outperform those ones obtained when applying the same compression algorithm to the binary files generated either via direct mapping of the original text or via mapping the decimal file using Binary Coded Decimal (BCD) codes.