view post Post 321 Reply Announce ๐ WebApp1K-Duo ๐ onekq-ai/WebApp1K-Duo-ReactThis is to keep up the challenge after OpenAI o1 models saturated the WebApp1K benchmark. The new benchmark brings SOTA to 67%. Let the hill climbing commence! onekq-ai/WebApp1K-models-leaderboardPS: I will publish more findings soon.
view post Post 513 Reply ๐ DeepSeek ๐2.5 is hands-down the best open-source model, leaving its peers way behind. It even beats GPT-4o mini. onekq-ai/WebApp1K-models-leaderboardThe inference of the official API is painfully slow though. I heard the team is short on GPUs (well, who isn't).