BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
Abstract
Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools. In this work, we introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks while maintaining computational efficiency, offering a practical solution for precise and flexible visual content creation. Project page: https://liyaowei-stu.github.io/project/BlobCtrl/
Community
BlobCtrl enables precise, user-friendly element-level visual manipulation.
Main Features: 🦉Element-level Add/Remove/Move/Replace/Enlarge/Shrink.
Arxiv: http://arxiv.org/abs/2503.13434
Github Code: https://github.com/TencentARC/BlobCtrl
Project Webpage: https://liyaowei-stu.github.io/project/BlobCtrl/
Huggingface Demo: https://huggingface.co./spaces/Yw22/BlobCtrl
Huggingface Models: https://huggingface.co./Yw22/BlobCtrl
Youtube: https://youtu.be/rdR4QRR-mbE
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing (2025)
- VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control (2025)
- DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode (2025)
- OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting (2025)
- Get In Video: Add Anything You Want to the Video (2025)
- ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions (2025)
- Personalize Anything for Free with Diffusion Transformer (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper