-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 37 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 36 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 46
Collections
Discover the best community collections!
Collections including paper arxiv:2502.12143
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Paper • 2502.19361 • Published • 26 -
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Paper • 2502.17407 • Published • 24 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 28 -
Language Models can Self-Improve at State-Value Estimation for Better Search
Paper • 2503.02878 • Published • 7
-
Slamming: Training a Speech Language Model on One GPU in a Day
Paper • 2502.15814 • Published • 66 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 28 -
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Paper • 2502.12574 • Published • 11 -
Large Language Diffusion Models
Paper • 2502.09992 • Published • 99
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 44 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 28 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 17 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 46
-
Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models
Paper • 2502.15086 • Published • 15 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 83 -
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information
Paper • 2502.14258 • Published • 25 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 28
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 99 -
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 31 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 17 -
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Paper • 2502.08130 • Published • 9
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 25 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 26 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 108 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 37 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 341 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 52 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 25
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 57
-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 69 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 77