JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published about 24 hours ago • 10
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images Paper • 2411.05738 • Published 5 days ago • 12
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published 14 days ago • 45
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published 20 days ago • 34
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published 21 days ago • 16
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published 23 days ago • 65
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation Paper • 2410.15748 • Published 23 days ago • 12
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Paper • 2410.13726 • Published 27 days ago • 10
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published 28 days ago • 40
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Paper • 2410.13830 • Published 27 days ago • 23
GS^3: Efficient Relighting with Triple Gaussian Splatting Paper • 2410.11419 • Published 29 days ago • 10
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published Oct 10 • 44
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4 • 35
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 37
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models Paper • 2311.13141 • Published Nov 22, 2023 • 13
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes Paper • 2311.13384 • Published Nov 22, 2023 • 50
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8 • 107
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 156