Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 22
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published Oct 21, 2024 • 26
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published Oct 3, 2024 • 52
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30, 2024 • 56
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection Paper • 2004.07667 • Published Apr 16, 2020
Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation Paper • 2305.16938 • Published May 26, 2023
Lexical Generalization Improves with Larger Models and Longer Training Paper • 2210.12673 • Published Oct 23, 2022
Data Contamination Report from the 2024 CONDA Shared Task Paper • 2407.21530 • Published Jul 31, 2024 • 10
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1, 2024 • 24
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25, 2024 • 17
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Paper • 2303.11897 • Published Mar 21, 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception Paper • 2303.02153 • Published Mar 3, 2023
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11, 2024 • 32