Post
2131
We conducted an experiment in an effort to revive LLaMA 1 33B as it had unique prose and a lack of "GPT-isms" and "slop" in its pretraining data, as well as being one of the favorites at the time. With multiple finetune runs, we were able to extend the model from it's pretrained base of 2048 to ~12,000 tokens adding approx. 500M tokens in the process. The effective length is 16,384 but it's better to keep it on the lower range. It writes well and in multiple formats. In the future, we have some ideas like implementing GQA. Please take a look and we would love to hear your feedback!
ZeusLabs/Chronos-Divergence-33B
ZeusLabs/Chronos-Divergence-33B