nGPT: Normalized Transformer with Representation Learning on the Hypersphere Paper • 2410.01131 • Published Oct 1, 2024 • 10
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 21 days ago • 141
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 110
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control Paper • 2405.04798 • Published May 8, 2024 • 1
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 27 days ago • 47
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 71
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 199
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 26
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 27
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • Jan 23 • 64
view article Article Fine-tune ModernBERT for RAG with Synthetic Data By sdiazlor and 2 others • Jan 20 • 37