akjindal53244
commited on
Commit
β’
0ff6652
0
Parent(s):
Add model files and update README
Browse files- .gitattributes +35 -0
- Llama-3.1-Storm-8B.Q4_K_M.gguf +3 -0
- Llama-3.1-Storm-8B.Q5_K_M.gguf +3 -0
- Llama-3.1-Storm-8B.Q6_K.gguf +3 -0
- Llama-3.1-Storm-8B.Q8_0.gguf +3 -0
- README.md +153 -0
- config.json +3 -0
.gitattributes
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
Llama-3.1-Storm-8B.Q4_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9132a94ae3441cd18132e94222d8e6b12d5f30627cfc0c46a27aa50551b49fa3
|
3 |
+
size 4920734496
|
Llama-3.1-Storm-8B.Q5_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9a817f5faf2cdee455563913882ba2d63b74c1ba6317b8441341faae1a9f458b
|
3 |
+
size 5732987680
|
Llama-3.1-Storm-8B.Q6_K.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:58bced62244245319393fb2992a0d4ad57a39999d20dd126ec14cd356fdee493
|
3 |
+
size 6596006688
|
Llama-3.1-Storm-8B.Q8_0.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d7e2c522af01158c9a427b350c55119d96517462dcfb34a71ccdd0dca6f07705
|
3 |
+
size 8540771104
|
README.md
ADDED
@@ -0,0 +1,153 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- de
|
5 |
+
- fr
|
6 |
+
- it
|
7 |
+
- pt
|
8 |
+
- hi
|
9 |
+
- es
|
10 |
+
- th
|
11 |
+
pipeline_tag: text-generation
|
12 |
+
tags:
|
13 |
+
- llama-3.1
|
14 |
+
- conversational
|
15 |
+
- instruction following
|
16 |
+
- reasoning
|
17 |
+
- function calling
|
18 |
+
license: llama3.1
|
19 |
+
base_model: akjindal53244/Llama-3.1-Storm-8B
|
20 |
+
---
|
21 |
+
|
22 |
+
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/tmOlbERGKP7JSODa6T06J.jpeg)
|
23 |
+
|
24 |
+
Authors: [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/), [Pawan Kumar Rajpoot](https://www.linkedin.com/in/pawanrajpoot/), [Ankur Parikh](https://www.linkedin.com/in/ankurnlpexpert/), [Akshita Sukhlecha](https://www.linkedin.com/in/akshita-sukhlecha/)
|
25 |
+
|
26 |
+
**π€ Hugging Face Announcement Blog**: https://huggingface.co/blog/akjindal53244/llama31-storm8b
|
27 |
+
|
28 |
+
<br>
|
29 |
+
|
30 |
+
# Llama-3.1-Storm-8B-GGUF
|
31 |
+
**This is the GGUF quantized version of [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B), for use with [llama.cpp](https://github.com/ggerganov/llama.cpp). BF16 Model [here](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)**
|
32 |
+
|
33 |
+
## TL;DR
|
34 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c75c1237333ccfef30a602/mDtDeiHwnBupw1k_n99Lf.png)
|
35 |
+
|
36 |
+
We present the [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model that outperforms Meta AI's [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) models significantly across diverse benchmarks as shown in the performance comparison plot in the next section. Our approach consists of three key steps:
|
37 |
+
1. **Self-Curation**: We applied two self-curation methods to select approximately 1 million high-quality examples from a pool of ~2.8 million open-source examples. **Our curation criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models (e.g. 70B, 405B).**
|
38 |
+
2. **Targeted fine-tuning**: We performed [Spectrum](https://arxiv.org/abs/2406.06623)-based targeted fine-tuning over the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. In our work, 50% of layers are frozen.
|
39 |
+
3. **Model Merging**: We merged our fine-tuned model with the [Llama-Spark](https://huggingface.co/arcee-ai/Llama-Spark) model using [SLERP](https://huggingface.co/blog/mlabonne/merge-models#1-slerp) method. The merging method produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both its parents. [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) improves Llama-3.1-8B-Instruct across 10 diverse benchmarks. These benchmarks cover areas such as instruction-following, knowledge-driven QA, reasoning, truthful answer generation, and function calling.
|
40 |
+
|
41 |
+
## π Introducing Llama-3.1-Storm-8B
|
42 |
+
[**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) builds upon the foundation of Llama-3.1-8B-Instruct, aiming to enhance both conversational and function calling capabilities within the 8B parameter model class.
|
43 |
+
|
44 |
+
As shown in the left subplot of the above figure, [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model improves Meta-Llama-3.1-8B-Instruct across various benchmarks - Instruction-following ([IFEval](https://arxiv.org/abs/2311.07911)), Knowledge-driven QA benchmarks ([GPQA](https://arxiv.org/abs/2311.12022), [MMLU-Pro](https://arxiv.org/pdf/2406.01574)), Reasoning ([ARC-C](https://arxiv.org/abs/1803.05457), [MuSR](https://arxiv.org/abs/2310.16049), [BBH](https://arxiv.org/pdf/2210.09261)), Reduced Hallucinations ([TruthfulQA](https://arxiv.org/abs/2109.07958)), and Function-Calling ([BFCL](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard)). This improvement is particularly significant for AI developers and enthusiasts who work with limited computational resources.
|
45 |
+
|
46 |
+
We also benchmarked our model with the recently published model [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) built on top of the Llama-3.1-8B-Instruct model. As shown in the right subplot of the above figure, **Llama-3.1-Storm-8B outperforms Hermes-3-Llama-3.1-8B on 7 out of 9 benchmarks**, with Hermes-3-Llama-3.1-8B surpassing Llama-3.1-Storm-8B on the MuSR benchmark and both models showing comparable performance on the BBH benchmark.
|
47 |
+
|
48 |
+
|
49 |
+
## Llama-3.1-Storm-8B Model Strengths
|
50 |
+
Llama-3.1-Storm-8B is a powerful generalist model useful for diverse applications. We invite the AI community to explore [Llama-3.1-Storm-8B](https://huggingface.co/collections/akjindal53244/storm-66ba6c96b7e24ecb592787a9) and look forward to seeing how it will be utilized in various projects and applications.
|
51 |
+
|
52 |
+
<table>
|
53 |
+
<tr>
|
54 |
+
<td><strong>Model Strength</strong>
|
55 |
+
</td>
|
56 |
+
<td><strong>Relevant Benchmarks</strong>
|
57 |
+
</td>
|
58 |
+
<tr>
|
59 |
+
<tr>
|
60 |
+
<td>π― Improved Instruction Following
|
61 |
+
</td>
|
62 |
+
<td>IFEval Strict (+3.93%)
|
63 |
+
</td>
|
64 |
+
<tr>
|
65 |
+
<tr>
|
66 |
+
<td>π Enhanced Knowledge Driven Question Answering
|
67 |
+
</td>
|
68 |
+
<td>GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%)
|
69 |
+
</td>
|
70 |
+
<tr>
|
71 |
+
<tr>
|
72 |
+
<td>π§ Better Reasoning
|
73 |
+
</td>
|
74 |
+
<td>ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%)
|
75 |
+
</td>
|
76 |
+
<tr>
|
77 |
+
<tr>
|
78 |
+
<td>π€ Superior Agentic Capabilities
|
79 |
+
</td>
|
80 |
+
<td>BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%)
|
81 |
+
</td>
|
82 |
+
<tr>
|
83 |
+
<tr>
|
84 |
+
<td>π« Reduced Hallucinations
|
85 |
+
</td>
|
86 |
+
<td>TruthfulQA (+9%)
|
87 |
+
</td>
|
88 |
+
<tr>
|
89 |
+
</table>
|
90 |
+
|
91 |
+
**Note**: All improvements are absolute gains over Meta-Llama-3.1-8B-Instruct.
|
92 |
+
|
93 |
+
|
94 |
+
## Llama-3.1-Storm-8B Models
|
95 |
+
1. `BF16`: [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)
|
96 |
+
2. β‘ `FP8`: [Llama-3.1-Storm-8B-FP8-Dynamic](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic)
|
97 |
+
3. β‘ `GGUF`: [Llama-3.1-Storm-8B-GGUF](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-GGUF)
|
98 |
+
|
99 |
+
## π» How to Use GGUF Model
|
100 |
+
|
101 |
+
```bash
|
102 |
+
pip install llama-cpp-python
|
103 |
+
```
|
104 |
+
|
105 |
+
```python
|
106 |
+
from huggingface_hub import hf_hub_download
|
107 |
+
from llama_cpp import Llama
|
108 |
+
|
109 |
+
## Download the GGUF model
|
110 |
+
model_name = "akjindal53244/Llama-3.1-Storm-8B-GGUF"
|
111 |
+
model_file = "Llama-3.1-Storm-8B.Q8_0.gguf" # this is the specific model file we'll use in this example. It's a 4-bit quant, but other levels of quantization are available in the model repo if preferred
|
112 |
+
model_path = hf_hub_download(model_name, filename=model_file)
|
113 |
+
|
114 |
+
## Instantiate model from downloaded file
|
115 |
+
llm = Llama(
|
116 |
+
model_path=model_path,
|
117 |
+
n_ctx=16000, # Context length to use
|
118 |
+
n_threads=32, # Number of CPU threads to use
|
119 |
+
n_gpu_layers=0 # Number of model layers to offload to GPU
|
120 |
+
)
|
121 |
+
|
122 |
+
generation_kwargs = {
|
123 |
+
"max_tokens":200,
|
124 |
+
"stop":["<|eot_id|>"],
|
125 |
+
"echo":False, # Echo the prompt in the output
|
126 |
+
"top_k":1 # Set this value > 1 for sampling decoding
|
127 |
+
}
|
128 |
+
|
129 |
+
prompt = "What is 2+2?"
|
130 |
+
res = llm(prompt, **generation_kwargs)
|
131 |
+
print(res["choices"][0]["text"])
|
132 |
+
```
|
133 |
+
|
134 |
+
|
135 |
+
## Alignment Note
|
136 |
+
While **Llama-3.1-Storm-8B** did not undergo an explicit model alignment process, it may still retain some alignment properties inherited from the Meta-Llama-3.1-8B-Instruct model.
|
137 |
+
|
138 |
+
## Cite Our Work
|
139 |
+
```
|
140 |
+
@misc {ashvini_kumar_jindal_2024,
|
141 |
+
author = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} },
|
142 |
+
title = { Llama-3.1-Storm-8B },
|
143 |
+
year = 2024,
|
144 |
+
url = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B },
|
145 |
+
doi = { 10.57967/hf/2902 },
|
146 |
+
publisher = { Hugging Face }
|
147 |
+
}
|
148 |
+
```
|
149 |
+
|
150 |
+
## Support Our Work
|
151 |
+
With 3 team-members spanned across 3 different time-zones, we have won [NeurIPS LLM Efficiency Challenge 2023](https://llm-efficiency-challenge.github.io/) and 4 other competitions in Finance and Arabic LLM space. We have also published [SOTA mathematical reasoning model](https://huggingface.co/akjindal53244/Arithmo-Mistral-7B).
|
152 |
+
|
153 |
+
**Llama-3.1-Storm-8B** is our most valuable contribution so far towards the open-source community. We are committed in developing efficient generalist LLMs. **We're seeking both computational resources and innovative collaborators to drive this initiative forward.**
|
config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "llama"
|
3 |
+
}
|