new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Apr 2

Submitted by

ChocoWu

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

·
11 authors

3

Submitted by

zhiyuanhucs

JudgeLRM: Large Reasoning Models as a Judge

·
7 authors

2

Submitted by

tarsur909

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

·
9 authors

1

Submitted by

ChenYi99

Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1

·
7 authors

2

Submitted by

weizhiwang

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

·
5 authors

6

Submitted by

akhaliq

Z1: Efficient Test-time Scaling with Code

·
5 authors

2

Submitted by

akhaliq

Command A: An Enterprise-Ready Large Language Model

·
226 authors

2

Submitted by

wbhu-tc

GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors

·
6 authors

1

Submitted by

xw-eric

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

·
6 authors

1

Submitted by

YuchengShi

Towards Trustworthy GUI Agents: A Survey

·
5 authors

2

Submitted by

akhaliq

Multi-Token Attention

·
4 authors

Submitted by

pabloruizponce

MixerMDM: Learnable Composition of Human Motion Diffusion Models

·
5 authors

1

Submitted by

akhaliq

Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

·
7 authors

3

Submitted by

Ray121381

Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

·
10 authors

1

Submitted by

ColorfulAI

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

·
6 authors

1

Submitted by

tsbpp

Scaling Language-Free Visual Representation Learning

·
11 authors

3

Submitted by

hbXNov

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

·
7 authors

Submitted by

deepkyu

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

·
9 authors

Submitted by

carboncoo

AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization

·
12 authors

2

Submitted by

AndrewZhou924

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

·
8 authors

1

Submitted by

akhaliq

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead

·
11 authors

Submitted by

xk-huang

m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models

·
5 authors

1

Submitted by

akhaliq

Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

·
4 authors

Submitted by

MaksimSTW

Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base

·
9 authors

1

Submitted by

MrezaPRZ

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

·
8 authors

1

Submitted by

onandon

DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting

·
2 authors

1

Submitted by

akhaliq

ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning

·
5 authors

Submitted by

sumuks

YourBench: Easy Custom Evaluation Sets for Everyone

·
6 authors

1

Submitted by

rdkarim

MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing

·
3 authors

1