Open CoT Leaderboard

community

Activity Feed Request to join this org

AI & ML interests

Chain of Thought, LLM Evaluation

Recent Activity

yakazimir authored a paper about 2 months ago

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

ggbetz updated a Space about 2 months ago

cot-leaderboard/open-cot-dashboard

ggbetz updated a dataset about 2 months ago

cot-leaderboard/cot-leaderboard-requests

View all activity

cot-leaderboard's activity

yakazimir

authored a paper about 2 months ago

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Paper • 2409.07440 • Published Sep 11 • 6

ggbetz

updated a Space about 2 months ago

Open CoT Dashboard

ggbetz

updated a dataset about 2 months ago

cot-leaderboard/cot-leaderboard-requests

Preview • Updated Nov 2 • 571

ggbetz

in cot-leaderboard/cot-leaderboard-results about 2 months ago

Update leaderboard for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#131 opened about 2 months ago by

ggbetz

in cot-leaderboard/cot-eval-results about 2 months ago

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1023 opened about 2 months ago by

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1024 opened about 2 months ago by

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1025 opened about 2 months ago by

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1026 opened about 2 months ago by

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1027 opened about 2 months ago by

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1028 opened about 2 months ago by

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1029 opened about 2 months ago by

Upload results for model HuggingFaceTB/SmolLM2-1.7B-Instruct

#1030 opened about 2 months ago by

ggbetz

updated 3 datasets about 2 months ago

cot-leaderboard/cot-leaderboard-results

Viewer • Updated Nov 2 • 123 • 8.05k

cot-leaderboard/cot-eval-results

Updated Nov 2 • 18.5k

cot-leaderboard/cot-eval-traces-2.0

Viewer • Updated Nov 1 • 3.42M • 51.5k • 3

ggbetz

posted an update 4 months ago

Post

1457

Hi, just a brief follow-up on our Guided Reasoning (GuiR) system:

I've created a template space that facilitates testing:

1. Duplicate space logikon/guir-chat
2. Setup your own inference servers and provide details in config file
3. Add api keys as secrets
4. Your personal GuiR playground is ready

Cheers, Gregor

ggbetz

posted an update 4 months ago

Post

1179

🧭 Guided Reasoning

👋Hi everyone,

We've been releasing Guided Reasoning:

Our AI guides walk your favorite LLM through complex reasoning problems.

🎯 Goals:

1️⃣ Reliability. AIs consistently follow reasoning methods.
2️⃣ Self-explainability. AIs see reasoning protocols and can explain internal deliberation.
3️⃣ Contestability. Users may amend AI reasoning and revise plausibility assessments.

Try out Guided Reasoning with our light demo chatbot, powered by 🤗 HuggingFace's free Inference Api and small LLMs. (Sorry for poor latency and limited availability -- we are currently searching for 💸 compute sponsors to run more powerful models, faster, and optimize guided reasoning performance.)

Built on top of Logikon's open-source AI reasoning analytics.

Demo chat app: logikon/benjamin-chat
Github: https://github.com/logikon-ai/logikon
Technical report: https://arxiv.org/abs/2408.16331

➡️ Check it out and get involved! Looking forward to hearing from you.

ggbetz

posted an update 9 months ago

Post

1440

🥇Open CoT Leaderboard

We're delighted to announce the [Open CoT Leaderboard]( logikon/open_cot_leaderboard) on 🤗 Spaces.

Unlike other LLM performance leaderboards, the Open CoT Leaderboard is not tracking absolute benchmark accuracies, but relative **accuracy gains** due to **chain-of-thought**.

Eval datasets that underpin the leaderboard are hosted [here](https://huggingface.co./cot-leaderboard).

Feedback and suggestions more than welcome.

@clefourrier

5 replies

·

yakazimir

authored 2 papers 11 months ago

Polyglot Semantic Parsing in APIs

Paper • 1803.06966 • Published Mar 19, 2018

The Code2Text Challenge: Text Generation in Source Code Libraries

Paper • 1708.00098 • Published Jul 31, 2017