IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published 3 days ago • 19
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 4 days ago • 66
FLAME: A Federated Learning Benchmark for Robotic Manipulation Paper • 2503.01729 • Published 6 days ago • 4
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published 6 days ago • 37
ABC: Achieving Better Control of Multimodal Embeddings using VLMs Paper • 2503.00329 • Published 9 days ago • 18
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content Paper • 2503.02357 • Published 6 days ago • 7
Iterative Value Function Optimization for Guided Decoding Paper • 2503.02368 • Published 6 days ago • 14
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published 13 days ago • 18
DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking Paper • 2502.20730 • Published 10 days ago • 32
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning Paper • 2502.20127 • Published 11 days ago • 9
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published 11 days ago • 19
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 10 days ago • 28
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving Paper • 2502.20238 • Published 10 days ago • 24
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published 12 days ago • 25
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published 11 days ago • 21
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance Paper • 2502.18772 • Published 12 days ago • 30
GCC: Generative Color Constancy via Diffusing a Color Checker Paper • 2502.17435 • Published 13 days ago • 27
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 12 days ago • 67