alma updates ?

#1
by cmp-nct - opened

Did you consider releasing alma with gemma and llama-3 ?

In german translation gemma-2 is better than alma, the 27b is almost flawless.
llama-3 should also be an upgrade, not sure about phi-3

Owner

Good question! Probably we will release a llama-3-based ALMA in the near future!

Will there be support for more languages?

also how is this similar to / any comments about
https://huggingface.co./SeaLLMs/SeaLLM-7B-v2.5 (based on gemma)
https://huggingface.co./SeaLLMs/SeaLLMs-v3-7B-Chat (based on qwen2)

I would be very interested in both Llama 3.1 and Phi-3 (any size) implementations of ALMA-R.

ALMA paper mentions translating between:

  • English & some others - de, cs, is, ru, zh

Llama2/3 supports/ is trained on:

  • English, de, fr, zh, it, pt, hi, es,

SeaLLM's pre-training performs well/better on:

  • English, zho, vie, ind, msa, tgl, tha, lao

Qwen2 is trained on:

  • English, de, fr, es, pt, it, nl;
  • ru, cs, pl; ar, fa, he, tr;
  • ja, ko; vi, th, id, ms, lo, my, km, tl;
  • hi, bn, ur

(See Lists_of_ISO_639_codes )

will there be support for more languages, for ALMA?, Perhaps you might not want to train as much on the less common languages, but shouldn't adding more languages, esp the ones it has been pre-trained on, improve it's translation capabilities, would it be able to generalize better? Have you done ablation tests?

also what can be done during the fine-tuning/optimization, to maintain the language model's score on the leaderboard
on the old leaderboard
51.68 ALMA-13B-Pretrain
49.32 ALMA-13B
47.85 ALMA-13B-R
(the fine-tuned models have a very slightly lower score (leaderboard), but the translation capabilities improve (ALMA-paper))

Sign up or log in to comment