NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Abstract
Recent advancements in generative models have significantly improved novel view synthesis (NVS) from multi-view data. However, existing methods depend on external multi-view alignment processes, such as explicit pose estimation or pre-reconstruction, which limits their flexibility and accessibility, especially when alignment is unstable due to insufficient overlap or occlusions between views. In this paper, we propose NVComposer, a novel approach that eliminates the need for explicit external alignment. NVComposer enables the generative model to implicitly infer spatial and geometric relationships between multiple conditional views by introducing two key components: 1) an image-pose dual-stream diffusion model that simultaneously generates target novel views and condition camera poses, and 2) a geometry-aware feature alignment module that distills geometric priors from dense stereo models during training. Extensive experiments demonstrate that NVComposer achieves state-of-the-art performance in generative multi-view NVS tasks, removing the reliance on external alignment and thus improving model accessibility. Our approach shows substantial improvements in synthesis quality as the number of unposed input views increases, highlighting its potential for more flexible and accessible generative NVS systems.
Community
Our GitHub repository: https://github.com/TencentARC/NVComposer
Hugging Face space for demo: https://huggingface.co./spaces/l-li/NVComposer
We welcome your feedback, questions, or collaboration. Thank you!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis (2024)
- Novel View Synthesis with Pixel-Space Diffusion Models (2024)
- MVBoost: Boost 3D Reconstruction with Multi-View Refinement (2024)
- FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation (2024)
- MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model (2024)
- World-consistent Video Diffusion with Explicit 3D Modeling (2024)
- Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper