(四十八):MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
- Abstract
- 1 Introduction
- 2 Related Work
- 3 Methodology
-
- 3.1 Fine-grained Visual/Textual Features
-
- 视觉特性表示
- 文本特征表示
- 3.2 Training Objective and Learning Settings训练目标和学习设置
-
- 对比损失
- Multimodal Similarity Functions
- Weakly-supervised setting
- Unsupervised setting