Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Paper • 2402.19427 • Published Feb 29 • 52
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 43
WizardLM: Empowering Large Language Models to Follow Complex Instructions Paper • 2304.12244 • Published Apr 24, 2023 • 13