Upload README.md
Browse files
README.md
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: other
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
inference: false
|
7 |
+
tags:
|
8 |
+
- transformers
|
9 |
+
- gguf
|
10 |
+
- imatrix
|
11 |
+
- SmallThinker-3B-Preview
|
12 |
+
---
|
13 |
+
Quantizations of https://huggingface.co/PowerInfer/SmallThinker-3B-Preview
|
14 |
+
|
15 |
+
### Inference Clients/UIs
|
16 |
+
* [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
17 |
+
* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
|
18 |
+
* [ollama](https://github.com/ollama/ollama)
|
19 |
+
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
|
20 |
+
* [jan](https://github.com/janhq/jan)
|
21 |
+
* [GPT4All](https://github.com/nomic-ai/gpt4all)
|
22 |
+
---
|
23 |
+
|
24 |
+
# From original readme
|
25 |
+
|
26 |
+
We introduce **SmallThinker-3B-preview**, a new model fine-tuned from the [Qwen2.5-3b-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) model.
|
27 |
+
|
28 |
+
Now you can directly deploy SmallThinker On your phones with [PowerServe](https://github.com/powerserve-project/PowerServe).
|
29 |
+
|
30 |
+
## Benchmark Performance
|
31 |
+
|
32 |
+
| Model | AIME24 | AMC23 | GAOKAO2024_I | GAOKAO2024_II | MMLU_STEM | AMPS_Hard | math_comp |
|
33 |
+
|---------|--------|-------|--------------|---------------|-----------|-----------|-----------|
|
34 |
+
| Qwen2.5-3B-Instruct | 6.67 | 45 | 50 | 35.8 | 59.8 | - | - |
|
35 |
+
| SmallThinker | 16.667 | 57.5 | 64.2 | 57.1 | 68.2 | 70 | 46.8 |
|
36 |
+
| GPT-4o | 9.3 | - | - | - | 64.2 | 57 | 50 |
|
37 |
+
|
38 |
+
Limitation: Due to SmallThinker's current limitations in instruction following, for math_comp we adopt a more lenient evaluation method where only correct answers are required, without constraining responses to follow the specified AAAAA format.
|
39 |
+
|
40 |
+
Colab Link: [Colab](https://colab.research.google.com/drive/182q600at0sVw7uX0SXFp6bQI7pyjWXQ2?usp=sharing)
|
41 |
+
## Intended Use Cases
|
42 |
+
|
43 |
+
SmallThinker is designed for the following use cases:
|
44 |
+
|
45 |
+
1. **Edge Deployment:** Its small size makes it ideal for deployment on resource-constrained devices.
|
46 |
+
2. **Draft Model for QwQ-32B-Preview:** SmallThinker can serve as a fast and efficient draft model for the larger QwQ-32B-Preview model. From my test, in llama.cpp we can get 70% speedup (from 40 tokens/s to 70 tokens/s).
|