Submitted by Juanxi 56 MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization · 11 authors 5
Submitted by Howe666 29 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction · 5 authors 1
Submitted by akhaliq 24 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance · 6 authors 4
Submitted by 8ruceLi 24 Towards Physically Plausible Video Generation via VLM Planning · 11 authors 2
Submitted by hanyang-21 22 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step · 4 authors 1
Submitted by wenhu 19 ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations · 10 authors 1
Submitted by huangrh9 16 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement · 11 authors 3
Submitted by akhaliq 13 Articulated Kinematics Distillation from Video Diffusion Models · 7 authors 2
Submitted by AdinaY 12 Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback · 3 authors 1
Submitted by Jarvis1111 11 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks · 7 authors 1
Submitted by YanNeu 9 DASH: Detection and Assessment of Systematic Hallucinations of VLMs · 3 authors 1
Submitted by Jiuzhouh 2 VerifiAgent: a Unified Verification Agent in Language Model Reasoning · 3 authors 1
Submitted by hychiang 2 Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models · 6 authors 1
Submitted by mawjdgus 1 Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations · 2 authors 1
Submitted by nielsr 1 MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis · 14 authors 1