What is the correct norm eps for 14B and 32B base models?
#1
by
msdejong
- opened
I wanted to check if the norm eps for 14B and 32B base models is correct. It's listed as 1e-5 for base versions, but it's 1e-6 for the instruct versions and also for all the 3, 7B models, both instruct and base.
I also asked the same thing
https://huggingface.co./Qwen/Qwen2.5-14B/discussions/6