-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 43 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2502.17157
-
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Paper • 2502.14786 • Published • 128 -
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
Paper • 2502.14834 • Published • 24 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 157 -
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Paper • 2502.17157 • Published • 51
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 43 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 50 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 37 -
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 5
-
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
Paper • 2412.14922 • Published • 86 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 349 -
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 73 -
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Paper • 2501.09732 • Published • 70
-
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Paper • 2410.09008 • Published • 17 -
answerdotai/ModernBERT-base
Fill-Mask • Updated • 6.09M • 784 -
answerdotai/ModernBERT-large
Fill-Mask • Updated • 273k • 360 -
microsoft/phi-4
Text Generation • Updated • 563k • • 1.88k
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 57
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 27 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 40 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 53 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 17 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 60 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 74