MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 49
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models Paper • 2308.14352 • Published Aug 28, 2023
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion Paper • 2308.06512 • Published Aug 12, 2023 • 2
Experts Weights Averaging: A New General Training Scheme for Vision Transformers Paper • 2308.06093 • Published Aug 11, 2023 • 2
ConstitutionalExperts: Training a Mixture of Principle-based Prompts Paper • 2403.04894 • Published Mar 7 • 2
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition Paper • 2402.02526 • Published Feb 4 • 3
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding Paper • 2006.16668 • Published Jun 30, 2020 • 3
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models Paper • 2402.01739 • Published Jan 29 • 26
ST-MoE: Designing Stable and Transferable Sparse Expert Models Paper • 2202.08906 • Published Feb 17, 2022 • 2
LocMoE: A Low-overhead MoE for Large Language Model Training Paper • 2401.13920 • Published Jan 25 • 2
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale Paper • 2201.05596 • Published Jan 14, 2022 • 2