CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval

Yaoxin Zhuo, Yikang Li, Jenhao Hsiao, Chiuman Ho, Baoxin Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the ever-increasing multimedia data on the Web, cross-modal video-text retrieval has received a lot of attention in recent years. Deep cross-modal hashing approaches utilize the Hamming space for achieving fast retrieval. However, most existing algorithms have difficulties in seeking or constructing a well-defined joint semantic space. In this paper, an unsupervised deep cross-modal video-text hashing approach (CLIP4Hashing) is proposed, which mitigates the difficulties in bridging between different modalities in the Hamming space through building a single hashing net by employing the pre-trained CLIP model. The approach is enhanced by two novel techniques, the dynamic weighting strategy and the design of the min-max hashing layer, which are found to be the main sources of the performance gain. Compared with conventional deep cross-modal hashing algorithms, CLIP4Hashing does not require data-specific hyper-parameters. With evaluation using three challenging video-text benchmark datasets, we demonstrate that CLIP4Hashing is able to significantly outperform existing state-of-the-art hashing algorithms. Additionally, with larger bit sizes (e.g., 2048 bits), CLIP4Hashing can even deliver competitive performance compared with the results based on non-hashing features.

Original languageEnglish (US)
Title of host publicationICMR 2022 - Proceedings of the 2022 International Conference on Multimedia Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages158-166
Number of pages9
ISBN (Electronic)9781450392389
DOIs
StatePublished - Jun 27 2022
Event2022 International Conference on Multimedia Retrieval, ICMR 2022 - Newark, United States
Duration: Jun 27 2022Jun 30 2022

Publication series

NameICMR 2022 - Proceedings of the 2022 International Conference on Multimedia Retrieval

Conference

Conference2022 International Conference on Multimedia Retrieval, ICMR 2022
Country/TerritoryUnited States
CityNewark
Period6/27/226/30/22

Keywords

  • cross-modal retrieval
  • deep learning
  • hashing
  • video-text retrieval

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'CLIP4Hashing: Unsupervised Deep Hashing for Cross-Modal Video-Text Retrieval'. Together they form a unique fingerprint.

Cite this