影响因子:8.2
所属单位:中国矿业大学
发表刊物:IEEE Transactions on Geoscience and Remote Sensing
关键字:Cross-modal remote-sensing image–text retrieval (CMRSITR), masked image modeling (MIM), masked
language modeling (MLM), momentum contrast
摘要:— Cross-modal remote sensing image–text retrieval
(CMRSITR) aims to extract comprehensive information from
diverse modalities. The primary challenge in this field is developing effective mappings between visual and textual modalities to a shared latent space. Existing approaches generally focus on utilizing pretrained unimodal models to independently extract features from each modality. However, these techniques often fall short of achieving the critical alignment necessary for
effective cross-modal matching. These techniques predominantly
concentrate on the extraction of features and alignment at
an instance level, suggesting potential areas for enhancement.
To address these limitations, we introduce the masked interaction
inferring and aligning (MIIA) framework, utilizing dynamic
contrastive learning (DCL). This framework is adept at discerning the intricate relationships between local visual–textual
tokens, thereby significantly bolstering the congruence of global
image–text pairings without relying on additional prior supervision. Initially, we devise a masked interaction inferring (MII) module, which fosters token-level interplays through a novel
masked visual-language (VL) modeling approach. Following
this, we implement a cross-modal DCL mechanism, which is
instrumental in capturing and aligning semantic correlations
between images and texts more effectively. Finally, to ensure
the comprehensive matching of visual and textual embeddings,
we introduce a unique technique known as bidirectional distribution matching (BDM). This method is designed to minimize the Kullback–Leibler (KL) divergence between the distributions of image–text similarity, computed using the negative queues in momentum contrast learning. Comprehensive experiments performed on well-established public datasets consistently validate the state-of-the-art performance of MIIA methods in the CMRSITR task.
论文类型:期刊论文
论文编号:5626215
学科门类:工学
一级学科:计算机科学与技术
文献类型:J
卷号:62
期号:2024
是否译文:否
发表时间:2024-06-21
收录刊物:SCI