Beginner Questions about Formatting Dataset and hardware

#11
by LogicBombaklot - opened

I would like to train a new low-resource language in ALMA-R. I am very new to LLM's but I am intrigued by the possibilities.

Can you please advise on how my monolingual and parallel datasets should be formatted for fine-tuning?

Can you also advise on what kind of vram I will need, and if I can use multiple gpu's?

Thank you, and great job on the model and the papers. I found CPO really interesting.

Sign up or log in to comment