-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 38 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 25 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 12 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2402.10790
-
Transformers are Multi-State RNNs
Paper • 2401.06104 • Published • 36 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 78 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 40 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602
-
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper • 2401.06951 • Published • 25 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 26 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 15 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 21
-
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 22 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 30 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 78 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 40
-
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper • 2401.03462 • Published • 27 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 14
-
Analyzing and Improving the Training Dynamics of Diffusion Models
Paper • 2312.02696 • Published • 31 -
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 48 -
Rethinking Patch Dependence for Masked Autoencoders
Paper • 2401.14391 • Published • 23 -
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 40
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 25 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 183 -
gorilla-llm/APIBench
Updated • 145 • 63 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 20
-
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper • 2310.12823 • Published • 35 -
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Paper • 2308.10848 • Published • 1 -
CLEX: Continuous Length Extrapolation for Large Language Models
Paper • 2310.16450 • Published • 9