Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs Paper • 2312.17080 • Published Dec 28, 2023 • 1
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 54
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper • 2404.16790 • Published Apr 25 • 7
A Thorough Examination of Decoding Methods in the Era of LLMs Paper • 2402.06925 • Published Feb 10 • 1