Kiwi-1.0-0.7B-32k

Pretrain Model

Developed by: EmpirischTech/ChaperoneAI
Backbone Model: Qwen2.5
Parameters: 700m
Context Window: 32k
Language(s): English
Library: HuggingFace Transformers
License: Creative Common Attribute 4.0 (CCA-4.0)
Contact: For questions and comments about the model, please email contact-us

Main Message

We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.

In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on 18 trillion tokens, our model was trained on only 5 billion tokens—over three orders of magnitude fewer—yet it achieves comparable performance.

Note: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.

Evaluation Results

Preplexity as Evaluation Metric

Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A lower perplexity score indicates better performance (i.e., the model is more confident in its predictions). Perplexity directly measures a model's ability to predict the next token, providing a clear gauge of its inherent language modeling performance without the influence of instruction tuning.

Main Results

Dataset	Qwen2.5-0.5B	Kiwi-0.7B
Hellaswag	44.82	83.74
Arc Challenge	41.92	59.5
Open Book QA	152.56	323.18

Hardware and Software

Hardware: We utilized an A100 for training our model
Training Factors: The model was pretrained using a combination of the DeepSpeed library and the HuggingFace Trainer

Contact Us

EmpirischTech/ChaperoneAI Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► Get in touch

empirischtech
/

Kiwi-1.0-0.7B-32k