Abstract
Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the multi-head mechanism to collectively attend to information from various representation spaces within different experts. In this paper, we present a novel implementation of MH-MoE that maintains both FLOPs and parameter parity with sparse Mixture of Experts models. Experimental results on language models show that the new implementation yields quality improvements over both vanilla MoE and fine-grained MoE models. Additionally, our experiments demonstrate that MH-MoE is compatible with 1-bit Large Language Models (LLMs) such as BitNet.
Community
Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the multi-head mechanism to collectively attend to information from various representation spaces within different experts. In this paper, we present a novel implementation of MH-MoE that maintains both FLOPs and parameter parity with sparse Mixture of Experts models. Experimental results on language models show that the new implementation yields quality improvements over both vanilla MoE and fine-grained MoE models. Additionally, our experiments demonstrate that MH-MoE is compatible with 1-bit Large Language Models (LLMs) such as BitNet.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts (2024)
- Upcycling Large Language Models into Mixture of Experts (2024)
- Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts (2024)
- MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts (2024)
- ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts (2024)
- On Expert Estimation in Hierarchical Mixture of Experts: Beyond Softmax Gating Functions (2024)
- MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
It seems that x and y in Equation 3 are incorrect. I believe the equation should be updated to align with the one provided in the attached image.
Thank you for pointing out the issue with Equation 3. We appreciate your attention to detail. We will update this equation with \tilde{x}_h.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper