What is the difference between this model and OpenLLaMA 7Bv2?

#1
by weiyucheng - opened

The training dataset seems to be the same, but this model's performance is much better.

The training dataset seems to be the same, but this model's performance is much better.

The sole difference lies in the training framework, which has been shifted from using Jax on TPU to employing MegatronLM on GPU. The traning loss is more lower.

@itsliupeng Are the hyperparameters the same?

@itsliupeng Are the hyperparameters the same?

Yes, cosinle lr 3e-4, batch_size 4M tokens, the same with llama2-7B

itsliupeng changed discussion status to closed

Sign up or log in to comment