-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 37 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 36 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 46
Collections
Discover the best community collections!
Collections including paper arxiv:2502.06772
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 106 -
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Paper • 2501.10120 • Published • 43 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 29 -
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario
Paper • 2501.10132 • Published • 19
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 25 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 26 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 108 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
Paper • 2412.13018 • Published • 41 -
Retrieval-augmented Large Language Models for Financial Time Series Forecasting
Paper • 2502.05878 • Published • 39 -
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper • 2502.06772 • Published • 21
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 50 -
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Paper • 2410.02707 • Published • 48 -
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
Paper • 2411.16489 • Published • 46 -
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper • 2502.06772 • Published • 21