metadata
language:
- en
- zh
license: apache-2.0
library_name: transformers
datasets:
- EleutherAI/pile
- togethercomputer/RedPajama-Data-1T
- p208p2002/wudao
widget:
- text: <s> 4 + 3 =
MiniLoong-3B
π arXiv | π» GitHub | π€ HuggingFace-MiniMA-3B | π€ HuggingFace-MiniChat-3B | π€ ModelScope-MiniMA-3B | π€ ModelScope-MiniChat-3B | π€ HuggingFace-MiniChat-1.5-3B | π€ HuggingFace-MiniMA-2-3B | π€ HuggingFace-MiniChat-2-3B | π€ HuggingFace-MiniMA-2-1B | π€ HuggingFace-MiniLoong-3B | π€ HuggingFace-MiniMix-2/4x3B
β Must comply with LICENSE of LLaMA-2 since it is derived from LLaMA-2.
Bibtex
@article{zhang2023law,
title={Towards the Law of Capacity Gap in Distilling Language Models},
author={Zhang, Chen and Song, Dawei and Ye, Zheyu and Gao, Yan},
year={2023},
url={https://arxiv.org/abs/2311.07052}
}