NovaSky-AI
/

Sky-T1-32B-Preview

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NovaSkyAI commited on 1 day ago

Commit

f672738

·

verified ·

1 Parent(s): d637751

Update README.md

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -36,12 +36,14 @@ We use Llama-Factory for training. On 8 H100, the training takes 19 hours with D
 ## Evaluation
-| Model                  | Math500 | AIME2024 | LiveCodeBench-Easy | LiveCodeBench-Medium | LiveCodeBench-Hard | GPQA-Diamond |
-|------------------------|---------|----------|---------------------|----------------------|--------------------|--------------|
-| Qwen-2.5-3 2B-Instruct | 85.2    | 16.7     | 82.4                | 40.0                 | 8.9                | 42.9         |
-| Sky-T1                 | 88.6    | 43.3     | 87.9                | 54.4                 | 17.1               | 53.5         |
-| QwQ                    | 90.6    | 50.0     | 88.7                | 57.3                 | 17.9               | 56.6         |
-| o1-preview             | 85.5    | 46.6     | 92.0                | 56.6                 | 13.8               | 73.3         |
 ## Acknowledgement
 We would like to thanks the compute resources from [Lambda Lab](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [AnyScale](https://www.anyscale.com/). We would like to thanks the academic feedback and support from the [Still-2 Team](https://arxiv.org/pdf/2412.09413), and [Junyang Lin](https://justinlin610.github.io/) from the [Qwen Team](https://qwenlm.github.io/).

 ## Evaluation
+|               | Sky-T1-32B-Preview | Qwen-2.5-32B-Instruct | QwQ   | o1-preview |
+|-----------------------|---------------------|--------|-------|------------|
+| Math500              | 82.4                    | 76.2    | 85.4 | 81.4       |
+| AIME2024             | 43.3                    | 16.7    | 50.0  | 40.0       |
+| LiveCodeBench-Easy   | 86.3                    | 84.6   | 90.7  | 92.9       |
+| LiveCodeBench-Medium | 56.8                    | 40.8   | 56.3  | 54.9       |
+| LiveCodeBench-Hard   | 17.9                    | 9.8   | 17.1  | 16.3       |
+| GPQA-Diamond         | 56.8                    | 45.5   | 52.5  | 75.2       |
 ## Acknowledgement
 We would like to thanks the compute resources from [Lambda Lab](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [AnyScale](https://www.anyscale.com/). We would like to thanks the academic feedback and support from the [Still-2 Team](https://arxiv.org/pdf/2412.09413), and [Junyang Lin](https://justinlin610.github.io/) from the [Qwen Team](https://qwenlm.github.io/).