Specialized Language Models with Cheap Inference from Limited Domain Data Paper • 2402.01093 • Published Feb 2 • 45
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 83
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published Sep 16 • 28
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning Paper • 2410.06456 • Published Oct 9 • 35
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21 • 43
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Paper • 2410.21169 • Published Oct 28 • 30
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 21 days ago • 118
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 20 days ago • 55
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper • 2410.12628 • Published Oct 16 • 29