期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:7
DOI:10.14569/IJACSA.2021.0120747
语种:English
出版社:Science and Information Society (SAI)
摘要:Compression of data in traditional relational database management systems significantly improves the system performance by decreasing the size of the data that results in less data transfer time within the communication environment and higher efficiency in I/O operations. The column-oriented database management systems should perform even better since each attribute is stored in a separate column, so that its sequential values are stored and accessed sequentially on the disk. That further increases the compression efficiency as the entire column is compressed/decompressed at once. The aim of this research is to determine if data compression could improve the performance of HBase, running on a small-sized Hadoop cluster, consisted of one name node and nine data nodes. Test scenario includes performing Insert and Select queries on multiple records with and without data compression. Four data compression algorithms are tested since they are natively supported by HBase - SNAPPY, LZO, LZ4 and GZ. Results show that data compression in HBase highly improves system performance in terms of storage saving. It shrinks data 5 to 10 times (depending on the algorithm) without any noticeable additional CPU load. That allows smaller but significantly faster SSD disks to be used as cluster’s primary data storage. Furthermore, the substantial decrease in the network traffic is an additional benefit with major impact on big data processing.
关键词:Column-oriented data stores; data compression; distributed non-relational databases; benchmarking column-oriented databases