Qwen/QwQ-32B-Preview · Independent Evaluation of QwQ-32B-Preview on MMLU-Pro Benchmark

Dear Qwen team and Hugging Face community,
We're excited to share our independent evaluation of the QwQ-32B-Preview using our implementation of the MMLU-Pro benchmark. You can find the detailed results here.

Our results show QwQ-32B-Preview achieving impressive performance across various categories, particularly in challenging domains like Logic (general, Propositional logic), Math (number theory, abstract Algebra, probability, group theory, combinatorics), Business & Finance, Nuclear Physics, Molecular Biology, Nutrition, and Computer Science/Programming.

For a deeper dive into specific categories and subcategories, use the 'Unity Subjects' tab. Filter by category to view aggregate scores.

We believe this evaluation provides valuable insights into the model's capabilities and hope you find it useful.

We plan to share more benchmarks in the future.