Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 20 days ago • 55 • 4
Mapping Memes to Words for Multimodal Hateful Meme Classification Paper • 2310.08368 • Published Oct 12, 2023
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval Paper • 2405.02951 • Published May 5
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation Paper • 2407.03056 • Published Jul 3
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing Paper • 2403.14828 • Published Mar 21
ECO: Ensembling Context Optimization for Vision-Language Models Paper • 2307.14063 • Published Jul 26, 2023 • 1
Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features Paper • 2308.11485 • Published Aug 22, 2023 • 1
One missing piece in Vision and Language: A Survey on Comics Understanding Paper • 2409.09502 • Published Sep 14 • 23
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization Paper • 2408.15914 • Published Aug 28 • 22 • 6
LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On Paper • 2305.13501 • Published May 22, 2023 • 1
Zero-Shot Composed Image Retrieval with Textual Inversion Paper • 2303.15247 • Published Mar 27, 2023 • 2
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing Paper • 2304.02051 • Published Apr 4, 2023 • 4