Multimodal Retrieval With Contrastive Pretraining

dc.authorscopusid55364564400
dc.authorscopusid57289197300
dc.authorscopusid57288694000
dc.authorscopusid58353740700
dc.authorscopusid6506505859
dc.contributor.authorAlsan, H.F.
dc.contributor.authorYildiz, E.
dc.contributor.authorSafdil, E.B.
dc.contributor.authorArslan, F.
dc.contributor.authorArsan, T.
dc.date.accessioned2023-10-19T15:05:32Z
dc.date.available2023-10-19T15:05:32Z
dc.date.issued2021
dc.department-tempAlsan, H.F., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Yildiz, E., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Safdil, E.B., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Arslan, F., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Arsan, T., Kadir Has University, Department of Computer Engineering, Istanbul, Turkeyen_US
dc.descriptionKocaeli University;Kocaeli University Technoparken_US
dc.description2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 --25 August 2021 through 27 August 2021 -- --172175en_US
dc.description.abstractIn this paper, we present multimodal data retrieval aided with contrastive pretraining. Our approach is to pretrain a contrastive network to assist in multimodal retrieval tasks. We work with multimodal data, which has image and caption (text) pairs. We present a dual encoder deep neural network with the image and text encoder to encode multimodal data (images and text) to represent vectors. These representation vectors are used for similarity-based retrieval. Image encoder is a 2D convolutional network, and text encoder is a recurrent neural network (Long-Short Term Memory). MS-COCO 2014 dataset has both images and captions, and it is used for multimodal training with triplet loss. We used a convolutional Siamese network to compute the similarities between images before the dual encoder training (contrastive pretraining). The advantage is that Siamese networks can aid the retrieval, and we seek to show if Siamese networks can be used in practice. Finally, we investigated the performance of Siamese assisted retrieval with BLEU score metric. We conclude that Siamese can help with image-to-text retrieval tasks. © 2021 IEEE.en_US
dc.identifier.citation1
dc.identifier.doi10.1109/INISTA52262.2021.9548414en_US
dc.identifier.isbn9781665436038
dc.identifier.scopus2-s2.0-85116673208en_US
dc.identifier.urihttps://doi.org/10.1109/INISTA52262.2021.9548414
dc.identifier.urihttps://hdl.handle.net/20.500.12469/4941
dc.institutionauthorArsan, Taner
dc.khas20231019-Scopusen_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartof2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 - Proceedingsen_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectConvolutional Networksen_US
dc.subjectDeep Learningen_US
dc.subjectLong-Short Term Memory (LSTM)en_US
dc.subjectMultimodal Dataen_US
dc.subjectPretrainingen_US
dc.subjectSiamese networksen_US
dc.subjectTriplet lossen_US
dc.subjectBrainen_US
dc.subjectComputer visionen_US
dc.subjectConvolutionen_US
dc.subjectConvolutional neural networksen_US
dc.subjectDeep neural networksen_US
dc.subjectNetwork codingen_US
dc.subjectConvolutional networksen_US
dc.subjectData retrievalen_US
dc.subjectDeep learningen_US
dc.subjectImage textsen_US
dc.subjectLong-short term memoryen_US
dc.subjectMulti-modalen_US
dc.subjectMulti-modal dataen_US
dc.subjectPre-trainingen_US
dc.subjectSiamese networken_US
dc.subjectTriplet lossen_US
dc.subjectLong short-term memoryen_US
dc.titleMultimodal Retrieval With Contrastive Pretrainingen_US
dc.typeConference Objecten_US
dspace.entity.typePublication
relation.isAuthorOfPublication7959ea6c-1b30-4fa0-9c40-6311259c0914
relation.isAuthorOfPublication.latestForDiscovery7959ea6c-1b30-4fa0-9c40-6311259c0914

Files