Performance Highlights

Hush-Qwen2.5-7B-Preview was created using the YoYo v3 merge technique, achieving a new high on the IFEVAL test for 7B models with a score of 79.62%. This makes it the second-best model in that category, though the leading model is currently unavailable, meaning we might be in first place by default!

Strengths

  • High IFEVAL Score: 79.62%, among the best for 7B models.
  • Well-rounded performance: Decent scores across various benchmarks.

Weaknesses

  • Low MATH Score: 35%, which is significantly lower than our past models (which scored at least 45%). Improving this would make the model substantially better overall.

Benchmark Results

Category Score (%)
Average 35.13
IFEVAL 79.62
BBH 35.33
MATH 37.54
GPQA 8.17
MUSR 12.73
MMLU 37.38

Next Steps

  • Finetune on Math: Bringing up the math score is a priority to create a well-balanced model.
  • Explore YoYo v4: The next step could be merging this model with another one that is strong in math using the YoYo v4 technique. However, YoYo v4 lacks proper documentation, making it a challenge to implement.
  • Develop a Math-Strong Model: An alternative approach is to build a new model that performs decently in all benchmarks but excels in math, then merge it with this one.

Conclusion

Hush-Qwen2.5-7B-Preview is a strong contender in the IFEVAL category, achieving one of the highest scores among 7B models. However, improving the math benchmark is a key priority for future iterations. By either finetuning or leveraging new merge techniques like YoYo v4, we can push the model to new heights.

Downloads last month
23
Safetensors
Model size
7.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for marcuscedricridia/Hush-Qwen2.5-7B-Preview