Novel Object 6D Pose Estimation with a Single Reference View
Abstract
Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method. Our key idea is to iteratively establish point-wise alignment in the camera coordinate system based on state space models (SSMs). Specifically, iterative camera-space point-wise alignment can effectively handle large pose discrepancies, while our proposed RGB and Points SSMs can capture long-range dependencies and spatial information from a single view, offering linear complexity and superior spatial modeling capability. Once pre-trained on synthetic data, SinRef-6D can estimate the 6D pose of a novel object using only a single reference view, without requiring retraining or a CAD model. Extensive experiments on six popular datasets and real-world robotic scenes demonstrate that we achieve on-par performance with CAD-based and dense reference view-based methods, despite operating in the more challenging single reference setting. Code will be released at https://github.com/CNJianLiu/SinRef-6D.
Community
We are excited to share our latest work "Novel Object 6D Pose Estimation with a Single Reference View".
Our approach (SinRef-6D) is a single reference view-based CAD model-free novel object 6D pose estimation method, which is simple yet effective and has strong scalability for practical applications.
Specifically, SinRef-6D simultaneously eliminates the need for object CAD models, dense reference views, and model retraining, offering enhanced efficiency and scalability while demonstrating strong generalization to potential real-world robotic applications.
Paper: https://arxiv.org/abs/2503.05578
Code: https://github.com/CNJianLiu/SinRef-6D
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation (2025)
- Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation (2025)
- SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting (2025)
- FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views (2025)
- HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation (2025)
- RGBSQGrasp: Inferring Local Superquadric Primitives from Single RGB Image for Graspability-Aware Bin Picking (2025)
- ZeroBP: Learning Position-Aware Correspondence for Zero-shot 6D Pose Estimation in Bin-Picking (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper