new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Apr 3

Submitted by

Juanxi

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

·
11 authors

5

Submitted by

zhijie3

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

·
7 authors

2

Submitted by

Howe666

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

·
5 authors

1

Submitted by

akhaliq

DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

·
6 authors

4

Submitted by

8ruceLi

Towards Physically Plausible Video Generation via VLM Planning

·
11 authors

2

Submitted by

lkevinzc

Understanding R1-Zero-Like Training: A Critical Perspective

·
8 authors

2

Submitted by

hanyang-21

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step

·
4 authors

1

Submitted by

akhaliq

PaperBench: Evaluating AI's Ability to Replicate AI Research

·
13 authors

Submitted by

wenhu

ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations

·
10 authors

1

Submitted by

huangrh9

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

·
11 authors

3

Submitted by

akhaliq

Articulated Kinematics Distillation from Video Diffusion Models

·
7 authors

2

Submitted by

AdinaY

Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback

·
3 authors

Submitted by

Jarvis1111

Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

·
7 authors

1

Submitted by

YanNeu

DASH: Detection and Assessment of Systematic Hallucinations of VLMs

·
3 authors

1

Submitted by

jameslahm

LSNet: See Large, Focus Small

·
5 authors

2

Submitted by

Jiuzhouh

VerifiAgent: a Unified Verification Agent in Language Model Reasoning

·
3 authors

Submitted by

hychiang

Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models

·
6 authors

1

Submitted by

KrithikV

Medical large language models are easily distracted

·
6 authors

1

Submitted by

weizhiwang

Adaptive Layer-skipping in Pre-trained LLMs

·
3 authors

1

Submitted by

Taeksoo

Target-Aware Video Diffusion Models

·
2 authors

Submitted by

mawjdgus

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

·
2 authors

1

Submitted by

nielsr

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

·
14 authors