Collections
Discover the best community collections!
Collections including paper arxiv:2404.08801
-
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 31 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 42 -
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Paper • 2404.07143 • Published • 103
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 104 -
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Paper • 2404.00399 • Published • 41 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 63
-
I am a Strange Dataset: Metalinguistic Tests for Language Models
Paper • 2401.05300 • Published • 4 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 63 -
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Paper • 2305.09781 • Published • 4 -
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 26
-
Can large language models explore in-context?
Paper • 2403.15371 • Published • 32 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2 -
AQuA: A Benchmarking Tool for Label Quality Assessment
Paper • 2306.09467 • Published • 1
-
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 21 -
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper • 2403.18816 • Published • 21 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 10 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 78