MastermindEval Collection Prompting and multiple-choice (MCQ) benchmarks to evaluate reasoning capabilities of LLMs using Mastermind. • 9 items • Updated 2 days ago
MastermindEval Collection Prompting and multiple-choice (MCQ) benchmarks to evaluate reasoning capabilities of LLMs using Mastermind. • 9 items • Updated 2 days ago