期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2018
卷号:56
期号:1
页码:6-20
DOI:10.14445/22312803/IJCTT-V56P102
出版社:Seventh Sense Research Group
摘要:thejoin operator in relational databases is one of the most IO intensive operations. Thelarge size of input relations makes it hard to fit them entirely in RAM during join processing. Therefore therelations are processed in chucks inside a RAM buffer of limited size.The ideabehindasuccessfuljoin algorithm is to make the most efficient use of the limited sized buffer to minimizethenumberof IOs. The hash join algorithm has been a popular algorithm due to its relativelylowIOcostscompared to other methods. In this paper we make the observation that the performanceofthehash join can be dramatically improved if we take advantage of skewed distributionsandmissingvalues in join attributes. We propose the filtered hash join (FHjoin) which filtersouttuplesoftheinput relations during the partitioning phase of the hash join to minimize the workleft forthejoinphase. The results show FHjoin can outperform the hybrid hash join by up to a factor 4 in terms of total execution time when the data is much skewed.