Safetensors
qwen2

Kiwi-1.0-0.7B-32k

Pretrain Model

Main Message

We present our initial results validating depth up-scaling—a method that combines depthwise scaling with continued pretraining. Unlike other LLM up-scaling approaches that rely on mixture-of-experts, DUS requires no complex modifications for efficient training and inference, making it a simple yet effective strategy for scaling high-performance LLMs from smaller models.

In our approach, we carefully selected the dense layers from Qwen2.5-0.5B to construct our model. Notably, while Qwen2.5-0.5B was trained on 18 trillion tokens, our model was trained on only 5 billion tokens—over three orders of magnitude fewer—yet it achieves comparable performance.

Note: Please note that this model has not yet been instruction-tuned; instruction-tuning is an area of ongoing development.

Evaluation Results

Preplexity as Evaluation Metric

Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A lower perplexity score indicates better performance (i.e., the model is more confident in its predictions). Perplexity directly measures a model's ability to predict the next token, providing a clear gauge of its inherent language modeling performance without the influence of instruction tuning.

Main Results

Dataset Qwen2.5-0.5B Kiwi-0.7B
Hellaswag 44.82 83.74
Arc Challenge 41.92 59.5
Open Book QA 152.56 323.18

Hardware and Software

Contact Us

EmpirischTech/ChaperoneAI Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► Get in touch

Downloads last month
14
Safetensors
Model size
703M params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for empirischtech/Kiwi-1.0-0.7B-32k

Quantizations
1 model

Dataset used to train empirischtech/Kiwi-1.0-0.7B-32k