Kiwi-1.0-0.7B-32k
Pretrain Model
- Developed by: EmpirischTech/ChaperoneAI
- Backbone Model: Qwen2.5
- Parameters: 700m
- Context Window: 32k
- Language(s): English
- Library: HuggingFace Transformers
- License: Creative Common Attribute 4.0 (CCA-4.0)
- Contact: For questions and comments about the model, please email contact-us
Main Message
We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.
In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on 18 trillion tokens, our model was trained on only 5 billion tokens—over three orders of magnitude fewer—yet it achieves comparable performance.
Note: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.
Evaluation Results
Preplexity as Evaluation Metric
Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A lower perplexity score indicates better performance (i.e., the model is more confident in its predictions). Perplexity directly measures a model's ability to predict the next token, providing a clear gauge of its inherent language modeling performance without the influence of instruction tuning.
Main Results
Dataset | Qwen2.5-0.5B | Kiwi-0.7B |
---|---|---|
Hellaswag | 44.82 | 83.74 |
Arc Challenge | 41.92 | 59.5 |
Open Book QA | 152.56 | 323.18 |
Hardware and Software
- Hardware: We utilized an A100 for training our model
- Training Factors: The model was pretrained using a combination of the DeepSpeed library and the HuggingFace Trainer
Contact Us
EmpirischTech/ChaperoneAI Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► Get in touch
- Downloads last month
- 14