ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance Paper • 2504.08716 • Published 7 days ago • 8
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 9 days ago • 69
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published 10 days ago • 101
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 10 days ago • 143
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 11 days ago • 161
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper • 2504.02605 • Published 16 days ago • 43