Alsan, H.F.Yildiz, E.Safdil, E.B.Arslan, F.Arsan, T.2023-10-192023-10-19202119781665436038https://doi.org/10.1109/INISTA52262.2021.9548414https://hdl.handle.net/20.500.12469/4941Kocaeli University;Kocaeli University Technopark2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 --25 August 2021 through 27 August 2021 -- --172175In this paper, we present multimodal data retrieval aided with contrastive pretraining. Our approach is to pretrain a contrastive network to assist in multimodal retrieval tasks. We work with multimodal data, which has image and caption (text) pairs. We present a dual encoder deep neural network with the image and text encoder to encode multimodal data (images and text) to represent vectors. These representation vectors are used for similarity-based retrieval. Image encoder is a 2D convolutional network, and text encoder is a recurrent neural network (Long-Short Term Memory). MS-COCO 2014 dataset has both images and captions, and it is used for multimodal training with triplet loss. We used a convolutional Siamese network to compute the similarities between images before the dual encoder training (contrastive pretraining). The advantage is that Siamese networks can aid the retrieval, and we seek to show if Siamese networks can be used in practice. Finally, we investigated the performance of Siamese assisted retrieval with BLEU score metric. We conclude that Siamese can help with image-to-text retrieval tasks. © 2021 IEEE.eninfo:eu-repo/semantics/closedAccessConvolutional NetworksDeep LearningLong-Short Term Memory (LSTM)Multimodal DataPretrainingSiamese networksTriplet lossBrainComputer visionConvolutionConvolutional neural networksDeep neural networksNetwork codingConvolutional networksData retrievalDeep learningImage textsLong-short term memoryMulti-modalMulti-modal dataPre-trainingSiamese networkTriplet lossLong short-term memoryMultimodal Retrieval With Contrastive PretrainingConference Object10.1109/INISTA52262.2021.95484142-s2.0-85116673208