文章基本信息

标题：A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE
本地全文：下载
作者：Xiuye Yin ; Liyong Chen
期刊名称：Scientific Programming
印刷版ISSN：1058-9244
出版年度：2022
卷号：2022
DOI：10.1155/2022/7314599
语种：English
出版社：Hindawi Publishing Corporation
摘要：In view of the complexity of the multimodal environment and the existing shallow network structure that cannot achieve high-precision image and text retrieval, a cross-modal image and text retrieval method combining efficient feature extraction and interactive learning convolutional autoencoder (CAE) is proposed. First, the residual network convolution kernel is improved by incorporating two-dimensional principal component analysis (2DPCA) to extract image features and extracting text features through long short-term memory (LSTM) and word vectors to efficiently extract graphic features. Then, based on interactive learning CAE, cross-modal retrieval of images and text is realized. Among them, the image and text features are respectively input to the two input terminals of the dual-modal CAE, and the image-text relationship model is obtained through the interactive learning of the middle layer to realize the image-text retrieval. Finally, based on Flickr30K, MSCOCO, and Pascal VOC 2007 datasets, the proposed method is experimentally demonstrated. The results show that the proposed method can complete accurate image retrieval and text retrieval. Moreover, the mean average precision (MAP) has reached more than 0.3, the area of precision-recall rate (PR) curves are better than other comparison methods, and they are applicable.