摘要:In traditional vector of locally aggregated descriptors (VLAD) method, the final VLAD vector is reshaped by summing up the residuals between each descriptor and its corresponding visual word. The norm of the residuals varies significantly, and it can make “visual burst”. This is caused by a fact that the contribution of each descriptor to VLAD vector is not the same. To address this problem, we add a different weight to each residual such that the contribution of each descriptor to the VLAD vector becomes even to a certain degree. Also, traditional VLAD method only uses the local gradient features of images. Thus it has a low discrimination. In this paper, local color features are extracted and used to the VLAD method. Moreover, we fuse deep features and the multiple VLAD vectors based on local gradient and color information. Also, in order to reduce running time and improve retrieval accuracy, PCA and whitening operations are used for VLAD vectors. Our proposed method is evaluated on three benchmark datasets, i.e., Holidays, Ukbench and Oxford5k. Experimental results show that our proposed method achieves good performance.