File size: 9,091 Bytes
fa15184 7712aab f198a07 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 58e249c 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab f797c35 7712aab fa15184 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
---
license: mit
pipeline_tag: text-to-image
tags:
- image-to-image
---
<h1 align="center">OmniGen: Unified Image Generation</h1>
More information please refer to our repo: https://github.com/VectorSpaceLab/OmniGen
<p align="center">
<a href="https://vectorspacelab.github.io/OmniGen/">
<img alt="Build" src="https://img.shields.io/badge/Project%20Page-OmniGen-yellow">
</a>
<a href="https://arxiv.org/abs/2409.11340">
<img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2409.11340-b31b1b.svg">
</a>
<a href="https://huggingface.co./spaces/Shitao/OmniGen">
<img alt="License" src="https://img.shields.io/badge/HF%20Demo-🤗-lightblue">
</a>
<a href="https://huggingface.co./Shitao/OmniGen-v1">
<img alt="Build" src="https://img.shields.io/badge/HF%20Model-🤗-yellow">
</a>
<a href="https://replicate.com/chenxwh/omnigen">
<img alt="Build" src="https://replicate.com/chenxwh/omnigen/badge">
</a>
</p>
<h4 align="center">
<p>
<a href=#1-news>News</a> |
<a href=#3-methodology>Methodology</a> |
<a href=#4-what-can-omnigen-do>Capabilities</a> |
<a href=#5-quick-start>Quick Start</a> |
<a href="#6-finetune">Finetune</a> |
<a href="#license">License</a> |
<a href="#citation">Citation</a>
<p>
</h4>
## 1. News
- 2024-10-28: We release new version of inference code, optimizing the memory usage and time cost. You can refer to [docs/inference.md](docs/inference.md#requiremented-resources) for detailed information.
- 2024-10-22: :fire: We release the code for OmniGen. Inference: [docs/inference.md](docs/inference.md) Train: [docs/fine-tuning.md](docs/fine-tuning.md)
- 2024-10-22: :fire: We release the first version of OmniGen. Model Weight: [Shitao/OmniGen-v1](https://huggingface.co./Shitao/OmniGen-v1) HF Demo: [🤗](https://huggingface.co./spaces/Shitao/OmniGen)
## 2. Overview
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide [inference code](#5-quick-start) so that everyone can explore more functionalities of OmniGen.
Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, **we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.**
Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspires more universal image-generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the [script](#6-finetune). Imagination is no longer limited; everyone can construct any image-generation task, and perhaps we can achieve very interesting, wonderful, and creative things.
If you have any questions, ideas, or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: [email protected], [email protected], [email protected]. We welcome any feedback to help us improve the model.
## 3. Methodology
You can see details in our [paper](https://arxiv.org/abs/2409.11340).
## 4. What Can OmniGen do?
OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. **OmniGen doesn't need additional plugins or operations, it can automatically identify the features (e.g., required object, human pose, depth mapping) in input images according to the text prompt.**
We showcase some examples in [inference.ipynb](inference.ipynb). And in [inference_demo.ipynb](inference_demo.ipynb), we show an interesting pipeline to generate and modify an image.
You can control the image generation flexibly via OmniGen
![demo](demo_cases.png)
If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try [fine-tuning OmniGen](#6-finetune).
## 5. Quick Start
### Using OmniGen
Install via Github:
```bash
git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .
```
You also can create a new environment to avoid conflicts:
```
# Create a python 3.10.12 conda env (you could also use virtualenv)
conda create -n omnigen python=3.10.12
conda activate omnigen
# Install pytorch with your CUDA version, e.g.
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/staoxiao/OmniGen.git
cd OmniGen
pip install -e .
```
Here are some examples:
```python
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
# Note: Your local model path is also acceptable, such as 'pipe = OmniGenPipeline.from_pretrained(your_local_model_path)', where all files in your_local_model_path should be organized as https://huggingface.co./Shitao/OmniGen-v1/tree/main
## Text to Image
images = pipe(
prompt="A curly-haired man in a red shirt is drinking tea.",
height=1024,
width=1024,
guidance_scale=2.5,
seed=0,
)
images[0].save("example_t2i.png") # save output PIL Image
## Multi-modal to Image
# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
images = pipe(
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
input_images=["./imgs/test_cases/two_man.jpg"],
height=1024,
width=1024,
guidance_scale=2.5,
img_guidance_scale=1.6,
seed=0
)
images[0].save("example_ti2i.png") # save output PIL image
```
- If out of memory, you can set `offload_model=True`. If the inference time is too long when inputting multiple images, you can reduce the `max_input_image_size`. For the required resources and the method to run OmniGen efficiently, please refer to [docs/inference.md#requiremented-resources](docs/inference.md#requiremented-resources).
- For more examples of image generation, you can refer to [inference.ipynb](inference.ipynb) and [inference_demo.ipynb](inference_demo.ipynb)
- For more details about the argument in inference, please refer to [docs/inference.md](docs/inference.md).
### Using Diffusers
Coming soon.
### Gradio Demo
We construct an online demo in [Huggingface](https://huggingface.co./spaces/Shitao/OmniGen).
For the local gradio demo, you need to install `pip install gradio spaces`, and then you can run:
```python
pip install gradio spaces
python app.py
```
#### Use Google Colab
To use with Google Colab, please use the following command:
```
!git clone https://github.com/staoxiao/OmniGen.git
%cd OmniGen
!pip install -e .
!pip install gradio spaces
!python app.py --share
```
## 6. Finetune
We provide a training script `train.py` to fine-tune OmniGen.
Here is a toy example about LoRA finetune:
```bash
accelerate launch --num_processes=1 train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--batch_size_per_device 2 \
--condition_dropout_prob 0.01 \
--lr 1e-3 \
--use_lora \
--lora_rank 8 \
--json_file ./toy_data/toy_subject_data.jsonl \
--image_path ./toy_data/images \
--max_input_length_limit 18000 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 10 \
--epochs 200 \
--log_every 1 \
--results_dir ./results/toy_finetune_lora
```
Please refer to [docs/fine-tuning.md](docs/fine-tuning.md) for more details (e.g. full finetune).
### Contributors:
Thank all our contributors for their efforts and warmly welcome new members to join in!
<a href="https://github.com/VectorSpaceLab/OmniGen/graphs/contributors">
<img src="https://contrib.rocks/image?repo=VectorSpaceLab/OmniGen" />
</a>
## License
This repo is licensed under the [MIT License](LICENSE).
## Citation
If you find this repository useful, please consider giving a star ⭐ and citation
```
@article{xiao2024omnigen,
title={Omnigen: Unified image generation},
author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
journal={arXiv preprint arXiv:2409.11340},
year={2024}
}
```
|