Request
This model is spectacular and is approaching Claude 3 levels of creativity/prose. I know this might be a long shot, but I was wondering if you'd consider fine-tuning Qwen 2.5 14B? I've found it's quite a bit smarter than Nemo and I really like its writing style (thought it does contain GPT slop and refusals). Just thought I'd throw that out there - would love to hear what you think!
Hear hear. I think Nemo's tokenizer is intriguing, but I'd like to see what can happen with that Qwen 2.5 14b model!
We can definitely try, in the mean time you might be like this finetune done by another person on Supernova Medius 14B (A a cross-architecture qwen-LLama3 distillation done by Arcee) - https://huggingface.co./underwoods/medius-erebus-magnum-14b
No guarantees it works but in my testing it seemed pretty good.