if you can record the problem and share it there , or on the forums in your own post , please dont be shy because i'm not sure but i do think it helps π€π€π€
boomers still pick zenodo.org instead of huggingface ??? absolutely clownish nonsense , my random datasets have 30x more downloads and views than front page zenodos ... gonna write a comparison blog , but yeah... cringe.
scroll down for the datasets, still figuring out how to optimize for discoverability , i do think on that part it will be better than zenodo[dot}org , it would be nice to write a tutorial about that and compare : we already have more downloads than most zenodo datasets from famous researchers !
perhaps the largest single training dataset of high quality text to date of 7.8 trillion tokens in 35 European languages and code.
the best part : the data was correctly licenced so it's actually future-proof!
the completions model is really creative and instruct fine tuned version is very good also.
now you can use such models for multi-lingual enterprise applications with further finetunes , long response generation, structured outputs (coding) also works.
@mlabonne hey there ππ»ββοΈ I kinda got obsessed with your great model , and i found the endpoint for it in lambda labs, but basically i got rate limited / banned for trying to make my DPO dataset project, i was wondering if you all had an open ai compatible solution for me to make a great "thinking" sft + dpo dataset with all the splits ππ»ππ» kinda desparate , it's true , but was looking forward to a nice write ups πππ