Thanks
#1
by
MB7977
- opened
Thank you for this. I'm curious to see if an exl2 version might cure some of the issues I've had with other quants of it. Just a heads up that you're missing the config.json. I used the one from The Bloke's repo but needed to fix the max_position_embeddings as it's incorrectly set to 2048. When quanting, exllamav2 threw up an illegal memory access/shaping error before I changed it to 4096 (I am quantizing with that sequence length).
Thanks! I didn't upload it since I was doing the same test and it failed me on 2048 ctx. Just uploaded the correct config.json.