smol llama
Collection
🚧"raw" pretrained smol_llama checkpoints - WIP 🚧
•
4 items
•
Updated
•
6
A small 101M param (total) decoder model. This is the first version of the model.
Some cool anecdotes about this model:
This checkpoint is the 'raw' pre-trained model and has not been tuned to a more specific task. It should be fine-tuned before use in most cases.
pypi
to generate Python code - linkIf you find this experiment useful and would like to add some words to your .bib
file, it would make us happy.
@misc {beespoke_data_2023,
author = { {Peter Szemraj and Vincent Haines} },
title = { smol_llama-101M-GQA (Revision 9c9c090) },
year = 2023,
url = { https://huggingface.co./BEE-spoke-data/smol_llama-101M-GQA },
doi = { 10.57967/hf/1440 },
publisher = { Hugging Face }
}
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 25.32 |
ARC (25-shot) | 23.55 |
HellaSwag (10-shot) | 28.77 |
MMLU (5-shot) | 24.24 |
TruthfulQA (0-shot) | 45.76 |
Winogrande (5-shot) | 50.67 |
GSM8K (5-shot) | 0.83 |
DROP (3-shot) | 3.39 |