What dose A14 means? Could we get the detail of Qwen MOE architechture?
#1
by
JohnSaxon
- opened
As the title says.
Something tells me A14B stands for active weight of the model.
That's true. A14B means out of 57B parameters, 14B are activated each time.
- The general introduction of the architecture is at https://qwenlm.github.io/blog/qwen-moe/#architecture.
- The hyperparameters of the architecture of this model are at https://huggingface.co./Qwen/Qwen2-57B-A14B-Instruct/blob/main/config.json.
- The implemenation is at https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_moe/modeling_qwen2_moe.py.
jklj077
changed discussion status to
closed