Multimodal Retrieval With Contrastive Pretraining

dc.contributor.author Alsan, H.F.
dc.contributor.author Yildiz, E.
dc.contributor.author Safdil, E.B.
dc.contributor.author Arslan, F.
dc.contributor.author Arsan, T.
dc.contributor.other Computer Engineering
dc.contributor.other 05. Faculty of Engineering and Natural Sciences
dc.contributor.other 01. Kadir Has University
dc.date.accessioned 2023-10-19T15:05:32Z
dc.date.available 2023-10-19T15:05:32Z
dc.date.issued 2021
dc.description Kocaeli University;Kocaeli University Technopark en_US
dc.description 2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 --25 August 2021 through 27 August 2021 -- --172175 en_US
dc.description.abstract In this paper, we present multimodal data retrieval aided with contrastive pretraining. Our approach is to pretrain a contrastive network to assist in multimodal retrieval tasks. We work with multimodal data, which has image and caption (text) pairs. We present a dual encoder deep neural network with the image and text encoder to encode multimodal data (images and text) to represent vectors. These representation vectors are used for similarity-based retrieval. Image encoder is a 2D convolutional network, and text encoder is a recurrent neural network (Long-Short Term Memory). MS-COCO 2014 dataset has both images and captions, and it is used for multimodal training with triplet loss. We used a convolutional Siamese network to compute the similarities between images before the dual encoder training (contrastive pretraining). The advantage is that Siamese networks can aid the retrieval, and we seek to show if Siamese networks can be used in practice. Finally, we investigated the performance of Siamese assisted retrieval with BLEU score metric. We conclude that Siamese can help with image-to-text retrieval tasks. © 2021 IEEE. en_US
dc.identifier.citationcount 1
dc.identifier.doi 10.1109/INISTA52262.2021.9548414 en_US
dc.identifier.isbn 9781665436038
dc.identifier.scopus 2-s2.0-85116673208 en_US
dc.identifier.uri https://doi.org/10.1109/INISTA52262.2021.9548414
dc.identifier.uri https://hdl.handle.net/20.500.12469/4941
dc.khas 20231019-Scopus en_US
dc.language.iso en en_US
dc.publisher Institute of Electrical and Electronics Engineers Inc. en_US
dc.relation.ispartof 2021 International Conference on INnovations in Intelligent SysTems and Applications, INISTA 2021 - Proceedings en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Convolutional Networks en_US
dc.subject Deep Learning en_US
dc.subject Long-Short Term Memory (LSTM) en_US
dc.subject Multimodal Data en_US
dc.subject Pretraining en_US
dc.subject Siamese networks en_US
dc.subject Triplet loss en_US
dc.subject Brain en_US
dc.subject Computer vision en_US
dc.subject Convolution en_US
dc.subject Convolutional neural networks en_US
dc.subject Deep neural networks en_US
dc.subject Network coding en_US
dc.subject Convolutional networks en_US
dc.subject Data retrieval en_US
dc.subject Deep learning en_US
dc.subject Image texts en_US
dc.subject Long-short term memory en_US
dc.subject Multi-modal en_US
dc.subject Multi-modal data en_US
dc.subject Pre-training en_US
dc.subject Siamese network en_US
dc.subject Triplet loss en_US
dc.subject Long short-term memory en_US
dc.title Multimodal Retrieval With Contrastive Pretraining en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional Arsan, Taner
gdc.author.scopusid 55364564400
gdc.author.scopusid 57289197300
gdc.author.scopusid 57288694000
gdc.author.scopusid 58353740700
gdc.author.scopusid 6506505859
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C4
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.departmenttemp Alsan, H.F., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Yildiz, E., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Safdil, E.B., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Arslan, F., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey; Arsan, T., Kadir Has University, Department of Computer Engineering, Istanbul, Turkey en_US
gdc.description.endpage 5
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.startpage 1
gdc.identifier.openalex W3202464651
gdc.oaire.diamondjournal false
gdc.oaire.impulse 3.0
gdc.oaire.influence 2.7108664E-9
gdc.oaire.isgreen false
gdc.oaire.keywords Image texts
gdc.oaire.keywords Multi-modal data
gdc.oaire.keywords Convolutional Networks
gdc.oaire.keywords Long-Short Term Memory (LSTM)
gdc.oaire.keywords Pretraining
gdc.oaire.keywords Brain
gdc.oaire.keywords Long-short term memory
gdc.oaire.keywords Deep learning
gdc.oaire.keywords Siamese network
gdc.oaire.keywords Convolution
gdc.oaire.keywords Siamese networks
gdc.oaire.keywords Deep Learning
gdc.oaire.keywords Network coding
gdc.oaire.keywords Pre-training
gdc.oaire.keywords Multi-modal
gdc.oaire.keywords Triplet loss
gdc.oaire.keywords Deep neural networks
gdc.oaire.keywords Long short-term memory
gdc.oaire.keywords Computer vision
gdc.oaire.keywords Convolutional neural networks
gdc.oaire.keywords Multimodal Data
gdc.oaire.keywords Convolutional networks
gdc.oaire.keywords Data retrieval
gdc.oaire.popularity 4.684139E-9
gdc.oaire.publicfunded false
gdc.openalex.fwci 0.179
gdc.openalex.normalizedpercentile 0.46
gdc.opencitations.count 3
gdc.plumx.mendeley 4
gdc.plumx.scopuscites 4
gdc.scopus.citedcount 4
relation.isAuthorOfPublication 7959ea6c-1b30-4fa0-9c40-6311259c0914
relation.isAuthorOfPublication.latestForDiscovery 7959ea6c-1b30-4fa0-9c40-6311259c0914
relation.isOrgUnitOfPublication fd8e65fe-c3b3-4435-9682-6cccb638779c
relation.isOrgUnitOfPublication 2457b9b3-3a3f-4c17-8674-7f874f030d96
relation.isOrgUnitOfPublication b20623fc-1264-4244-9847-a4729ca7508c
relation.isOrgUnitOfPublication.latestForDiscovery fd8e65fe-c3b3-4435-9682-6cccb638779c

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
4941.pdf
Size:
2.13 MB
Format:
Adobe Portable Document Format
Description:
Tam Metin / Full Text