BrushEdit: All-In-One Image Inpainting and Editing Paper • 2412.10316 • Published Dec 13, 2024 • 33
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published Oct 17, 2024 • 34
What matters when building vision-language models? Paper • 2405.02246 • Published May 3, 2024 • 102
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published May 13, 2024 • 20
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14, 2024 • 55
FiT: Flexible Vision Transformer for Diffusion Model Paper • 2402.12376 • Published Feb 19, 2024 • 48
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20, 2024 • 48