12 17 1

Yi Cui

onekq

https://onekq.ai

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

updated a Space 1 day ago

onekq-ai/WebApp1K-models-leaderboard

posted an update 2 days ago

QwQ-32B is amazing! It ranks below o1-preview, but beats DeepSeek v3 and all Gemini models. https://huggingface.co./spaces/onekq-ai/WebApp1K-models-leaderboard Now we have such a powerful model that can fit into a single GPU, can someone finetune a web app model to push SOTA of my leaderboard? 🤗

posted an update 3 days ago

From my own experience these are the pain points for reasoning model adoption. (1) expensive and even worse, slow, due to excessive token output. You need to 10x your max output length to avoid clipping the thinking process. (2) you have to filter thinking tokens to retrieve the final output. For mature workflows, this means broad or deep refactoring. 1p vendors (open-source and proprietary) ease these pain points by manipulating their own models. But the problems are exposed when the reasoning model is hosted by 3p MaaS providers.

View all activity

Organizations

onekq's activity

updated a Space 1 day ago

WebApp1K Models Leaderboard

🥇

Display pass@k metrics for models and scenarios

posted an update 2 days ago

Post

2464

QwQ-32B is amazing!

It ranks below o1-preview, but beats DeepSeek v3 and all Gemini models.
onekq-ai/WebApp1K-models-leaderboard

Now we have such a powerful model that can fit into a single GPU, can someone finetune a web app model to push SOTA of my leaderboard? 🤗

posted an update 3 days ago

Post

492

From my own experience these are the pain points for reasoning model adoption.

(1) expensive and even worse, slow, due to excessive token output. You need to 10x your max output length to avoid clipping the thinking process.

(2) you have to filter thinking tokens to retrieve the final output. For mature workflows, this means broad or deep refactoring.

1p vendors (open-source and proprietary) ease these pain points by manipulating their own models. But the problems are exposed when the reasoning model is hosted by 3p MaaS providers.