期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2012
卷号:2
期号:6
出版社:S.S. Mishra
摘要:Compression in traditional database systems is known to improve performance significantly [1, 4] , it reduces the size of the data and improves I/O performance by reducing seek times (the data are stored nearer to each other), reducing transfer times (there is less data to transfer), and increasing bu.er hit rate (a larger fraction of the DBMS fits in bu.er pool). For queries that are I/O limited, the CPU overhead of decompression is often compensated for by the I/O improvements. We revisit this literature on compression in the context of column-oriented database systems. Storing data in columns presents a number of opportunities for improved performance from compression algorithms when compared to row-oriented architectures. In a column-oriented database, compression schemes that encode multiple values at once are natural. In a row-oriented database, such schemes do not work as well because an attribute is stored as a part of an entire tuple, so combining the same attribute from di.erent tuples together into one value would require some way to "mix" tuples. Compression techniques for row-stores often employ dictionary schemes where a dictionary is used to code wide values in the attribute domain into smaller codes. For example, a simple dictionary for a string-typed column of colors might map "blue" to 0, "yellow" t o 1, "green" to 2, and so on [1, 2]. Sometimes these schemes employ prefix -coding based on symbol frequencies (e.g., Hu.man encoding [46]) or ex press values as small di.erences from some frame of reference and remove leading nulls from them