Multimodal Retrieval With Contrastive Pretraining

Alsan, H.F.; Arsan, Taner; Yildiz, E.; Safdil, E.B.; Arslan, F.; Arsan, T.

Multimodal Retrieval With Contrastive Pretraining

dc.authorscopusid	55364564400
dc.authorscopusid	57289197300
dc.authorscopusid	57288694000
dc.authorscopusid	58353740700
dc.authorscopusid	6506505859
dc.contributor.author	Alsan, H.F.
dc.contributor.author	Arsan, Taner
dc.contributor.author	Yildiz, E.
dc.contributor.author	Safdil, E.B.
dc.contributor.author	Arslan, F.
dc.contributor.author	Arsan, T.
dc.contributor.other	Computer Engineering
dc.date.accessioned	2023-10-19T15:05:32Z
dc.date.available	2023-10-19T15:05:32Z
dc.date.issued	2021
dc.department-temp	Alsan, H.F., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Yildiz, E., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Safdil, E.B., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Arslan, F., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Arsan, T., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey	en_US
dc.description	Kocaeli University;Kocaeli University Technopark	en_US
dc.description	2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 --25 August 2021 through 27 August 2021 -- --172175	en_US
dc.description.abstract	In this paper, we present multimodal data retrieval aided with contrastive pretraining. Our approach is to pretrain a contrastive network to assist in multimodal retrieval tasks. We work with multimodal data, which has image and caption (text) pairs. We present a dual encoder deep neural network with the image and text encoder to encode multimodal data (images and text) to represent vectors. These representation vectors are used for similarity-based retrieval. Image encoder is a 2D convolutional network, and text encoder is a recurrent neural network (Long-Short Term Memory). MS-COCO 2014 dataset has both images and captions, and it is used for multimodal training with triplet loss. We used a convolutional Siamese network to compute the similarities between images before the dual encoder training (contrastive pretraining). The advantage is that Siamese networks can aid the retrieval, and we seek to show if Siamese networks can be used in practice. Finally, we investigated the performance of Siamese assisted retrieval with BLEU score metric. We conclude that Siamese can help with image-to-text retrieval tasks. © 2021 IEEE.	en_US
dc.identifier.citationcount	1
dc.identifier.doi	10.1109/INISTA52262.2021.9548414	en_US
dc.identifier.isbn	9781665436038
dc.identifier.scopus	2-s2.0-85116673208	en_US
dc.identifier.uri	https://doi.org/10.1109/INISTA52262.2021.9548414
dc.identifier.uri	https://hdl.handle.net/20.500.12469/4941
dc.khas	20231019-Scopus	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartof	2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 - Proceedings	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.scopus.citedbyCount	4
dc.subject	Convolutional Networks	en_US
dc.subject	Deep Learning	en_US
dc.subject	Long-Short Term Memory (LSTM)	en_US
dc.subject	Multimodal Data	en_US
dc.subject	Pretraining	en_US
dc.subject	Siamese networks	en_US
dc.subject	Triplet loss	en_US
dc.subject	Brain	en_US
dc.subject	Computer vision	en_US
dc.subject	Convolution	en_US
dc.subject	Convolutional neural networks	en_US
dc.subject	Deep neural networks	en_US
dc.subject	Network coding	en_US
dc.subject	Convolutional networks	en_US
dc.subject	Data retrieval	en_US
dc.subject	Deep learning	en_US
dc.subject	Image texts	en_US
dc.subject	Long-short term memory	en_US
dc.subject	Multi-modal	en_US
dc.subject	Multi-modal data	en_US
dc.subject	Pre-training	en_US
dc.subject	Siamese network	en_US
dc.subject	Triplet loss	en_US
dc.subject	Long short-term memory	en_US
dc.title	Multimodal Retrieval With Contrastive Pretraining	en_US
dc.type	Conference Object	en_US
dspace.entity.type	Publication
relation.isAuthorOfPublication	7959ea6c-1b30-4fa0-9c40-6311259c0914
relation.isAuthorOfPublication.latestForDiscovery	7959ea6c-1b30-4fa0-9c40-6311259c0914
relation.isOrgUnitOfPublication	fd8e65fe-c3b3-4435-9682-6cccb638779c
relation.isOrgUnitOfPublication.latestForDiscovery	fd8e65fe-c3b3-4435-9682-6cccb638779c

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 4941.pdf
Size:: 2.13 MB
Format:: Adobe Portable Document Format
Description:: Tam Metin / Full Text

Download

Collections

Scopus İndeksli Yayınlar Koleksiyonu