K2
Collection
K2, LLM360's most powerful, scaled model series.
•
7 items
•
Updated
•
7
We encountered two major loss spikes while training K2.
We are releasing these checkpoints so others can study this interesting phenomena in large model training.
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.
Checkpoints | |
---|---|
Checkpoint 186 | Checkpoint 194 |
Checkpoint 188 | Checkpoint 196 |
Checkpoint 190 | Checkpoint 198 |
Checkpoint 192 | Checkpoint 200 |
[to find all branches: git branch -a]
View all the evaluations on our Weights & Biases here
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.
BibTeX:
@misc{
title={LLM360-K2-65B: Scaling Up Open and Transparent Language Models},
author={The LLM360 Team},
year={2024},
}