SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published 13 days ago • 35
Mechanistic Permutability: Match Features Across Layers Paper • 2410.07656 • Published Oct 10 • 16
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 28 days ago • 289
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 16 items • Updated 6 days ago • 36
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30 • 47
Visual Scorers! Collection Variants of Visual Evaluation Models proposed by [Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-defined Levels]. Use by `model.score()`! • 10 items • Updated 24 days ago • 2
Gemma 2 2B Release Collection The 2.6B parameter version of Gemma 2. • 6 items • Updated 12 days ago • 77
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 32
Theia: Distilling Diverse Vision Foundation Models for Robot Learning Paper • 2407.20179 • Published Jul 29 • 46
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26 • 31
WebUI (CHI 2023) Collection Learning Mobile User Interface Representation with Web Semantics • 23 items • Updated Nov 1 • 5
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published Jun 12 • 23
Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian Paper • 2405.13929 • Published May 22 • 54
Many-Shot In-Context Learning in Multimodal Foundation Models Paper • 2405.09798 • Published May 16 • 26