-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 177 -
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
Paper • 2502.14739 • Published • 94 -
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper • 2502.14502 • Published • 83 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 18
Collections
Discover the best community collections!
Collections including paper arxiv:2502.19613
-
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Paper • 2503.01743 • Published • 65 -
Phi-4 Technical Report
Paper • 2412.08905 • Published • 111 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 59 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 19
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 25 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 26 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 108 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 37 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 341 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 52 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 25
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 68 -
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 49 -
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 54 -
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper • 2410.20088 • Published • 5
-
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 99 -
Building and better understanding vision-language models: insights and future directions
Paper • 2408.12637 • Published • 126 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 57