Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.12143

Reasoning, Thinking, RL and Test-Time Scaling

about 6 hours ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 36
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 46

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Paper • 2502.19361 • Published 11 days ago • 26
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning

Paper • 2502.17407 • Published 13 days ago • 24
Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published 20 days ago • 28
Language Models can Self-Improve at State-Value Estimation for Better Search

Paper • 2503.02878 • Published 5 days ago • 7

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 18 days ago • 66
Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published 20 days ago • 28
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published 20 days ago • 11
Large Language Diffusion Models

Paper • 2502.09992 • Published 24 days ago • 99

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published 17 days ago • 44
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Paper • 2502.12853 • Published 19 days ago • 28
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published 24 days ago • 17
Distillation Scaling Laws

Paper • 2502.08606 • Published 25 days ago • 46

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Paper • 2502.15086 • Published 17 days ago • 15
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Paper • 2502.14502 • Published 17 days ago • 83
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

Paper • 2502.14258 • Published 18 days ago • 25
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning

Paper • 2502.12853 • Published 19 days ago • 28

Large Language Diffusion Models

Paper • 2502.09992 • Published 24 days ago • 99
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published 23 days ago • 31
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published 24 days ago • 17
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Paper • 2502.08130 • Published 26 days ago • 9

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 25
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 26
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 108
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Llms and reasoning

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16 • 37
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 341
Chain-of-Retrieval Augmented Generation

Paper • 2501.14342 • Published Jan 24 • 52
RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 25

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 42
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 57

about 8 hours ago

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14, 2024 • 55
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 25
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 69
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 77

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs