Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Abstract
With the rapid proliferation of 3D devices and the shortage of 3D content, stereo conversion is attracting increasing attention. Recent works introduce pretrained Diffusion Models (DMs) into this task. However, due to the scarcity of large-scale training data and comprehensive benchmarks, the optimal methodologies for employing DMs in stereo conversion and the accurate evaluation of stereo effects remain largely unexplored. In this work, we introduce the Mono2Stereo dataset, providing high-quality training data and benchmark to support in-depth exploration of stereo conversion. With this dataset, we conduct an empirical study that yields two primary findings. 1) The differences between the left and right views are subtle, yet existing metrics consider overall pixels, failing to concentrate on regions critical to stereo effects. 2) Mainstream methods adopt either one-stage left-to-right generation or warp-and-inpaint pipeline, facing challenges of degraded stereo effect and image distortion respectively. Based on these findings, we introduce a new evaluation metric, Stereo Intersection-over-Union, which prioritizes disparity and achieves a high correlation with human judgments on stereo effect. Moreover, we propose a strong baseline model, harmonizing the stereo effect and image quality simultaneously, and notably surpassing current mainstream methods. Our code and data will be open-sourced to promote further research in stereo conversion. Our models are available at mono2stereo-bench.github.io.
Community
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Key Contributions:
• Weconstruct Mono2Stereo, a large-scale benchmark designed for high-quality stereo conversion. This benchmark encompasses three key dimensions to facilitate a comprehensive evaluation of such methods.
• We introduce Stereo Intersection-over-Union (SIoU), a novel and pioneering evaluation metric designed to as
sess the prominence of stereoscopic effects in stereo pairs. This metric effectively complements existing evaluation
metrics for a thorough assessment.
• Through extensive experiments, we establish a strong baseline model for stereo conversion. Benefiting from dual conditioning and Edge Consistency loss, our model achieves both compelling image quality and convincing stereo effects.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper