camenduru commited on Nov 20, 2023

Commit

732ea87

•

1 Parent(s): 65e0954

thanks to h94 ❤

Browse files

Files changed (28) hide show

README.md +46 -0
fig1.png +0 -0
models/image_encoder/config.json +23 -0
models/image_encoder/model.safetensors +3 -0
models/image_encoder/pytorch_model.bin +3 -0
models/ip-adapter-full-face_sd15.bin +3 -0
models/ip-adapter-full-face_sd15.safetensors +3 -0
models/ip-adapter-plus-face_sd15.bin +3 -0
models/ip-adapter-plus-face_sd15.safetensors +3 -0
models/ip-adapter-plus_sd15.bin +3 -0
models/ip-adapter-plus_sd15.safetensors +3 -0
models/ip-adapter_sd15.bin +3 -0
models/ip-adapter_sd15.safetensors +3 -0
models/ip-adapter_sd15_light.bin +3 -0
models/ip-adapter_sd15_light.safetensors +3 -0
models/ip-adapter_sd15_vit-G.bin +3 -0
models/ip-adapter_sd15_vit-G.safetensors +3 -0
sdxl_models/image_encoder/config.json +81 -0
sdxl_models/image_encoder/model.safetensors +3 -0
sdxl_models/image_encoder/pytorch_model.bin +3 -0
sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin +3 -0
sdxl_models/ip-adapter-plus-face_sdxl_vit-h.safetensors +3 -0
sdxl_models/ip-adapter-plus_sdxl_vit-h.bin +3 -0
sdxl_models/ip-adapter-plus_sdxl_vit-h.safetensors +3 -0
sdxl_models/ip-adapter_sdxl.bin +3 -0
sdxl_models/ip-adapter_sdxl.safetensors +3 -0
sdxl_models/ip-adapter_sdxl_vit-h.bin +3 -0
sdxl_models/ip-adapter_sdxl_vit-h.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+tags:
+- text-to-image
+- stable-diffusion
+license: apache-2.0
+language:
+- en
+library_name: diffusers
+---
+# IP-Adapter Model Card
+<div align="center">
+[**Project Page**](https://ip-adapter.github.io) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2308.06721) **|** [**Code**](https://github.com/tencent-ailab/IP-Adapter)
+</div>
+---
+## Introduction
+we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.
+![arch](./fig1.png)
+## Models
+### Image Encoder
+- [models/image_encoder](https://huggingface.co/h94/IP-Adapter/tree/main/models/image_encoder): [OpenCLIP-ViT-H-14](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) with 632.08M parameter
+- [sdxl_models/image_encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder): [OpenCLIP-ViT-bigG-14](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k) with 1844.9M parameter
+More information can be found [here](https://laion.ai/blog/giant-openclip/)
+### IP-Adapter for SD 1.5
+- [ip-adapter_sd15.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter_sd15.bin): use global image embedding from OpenCLIP-ViT-H-14 as condition
+- [ip-adapter_sd15_light.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter_sd15_light.bin): same as ip-adapter_sd15, but more compatible with text prompt
+- [ip-adapter-plus_sd15.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter-plus_sd15.bin): use patch image embeddings from OpenCLIP-ViT-H-14 as condition, closer to the reference image than ip-adapter_sd15
+- [ip-adapter-plus-face_sd15.bin](https://huggingface.co/h94/IP-Adapter/blob/main/models/ip-adapter-plus-face_sd15.bin): same as ip-adapter-plus_sd15, but use cropped face image as condition
+### IP-Adapter for SDXL 1.0
+- [ip-adapter_sdxl.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter_sdxl.bin): use global image embedding from OpenCLIP-ViT-bigG-14 as condition
+- [ip-adapter_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter_sdxl_vit-h.bin): same as ip-adapter_sdxl, but use OpenCLIP-ViT-H-14
+- [ip-adapter-plus_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus_sdxl_vit-h.bin): use patch image embeddings from OpenCLIP-ViT-H-14 as condition, closer to the reference image than ip-adapter_xl and ip-adapter_sdxl_vit-h
+- [ip-adapter-plus-face_sdxl_vit-h.bin](https://huggingface.co/h94/IP-Adapter/blob/main/sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin): same as ip-adapter-plus_sdxl_vit-h, but use cropped face image as condition

fig1.png ADDED Viewed

models/image_encoder/config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "_name_or_path": "./image_encoder",
+  "architectures": [
+    "CLIPVisionModelWithProjection"
+  ],
+  "attention_dropout": 0.0,
+  "dropout": 0.0,
+  "hidden_act": "gelu",
+  "hidden_size": 1280,
+  "image_size": 224,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 5120,
+  "layer_norm_eps": 1e-05,
+  "model_type": "clip_vision_model",
+  "num_attention_heads": 16,
+  "num_channels": 3,
+  "num_hidden_layers": 32,
+  "patch_size": 14,
+  "projection_dim": 1024,
+  "torch_dtype": "float16",
+  "transformers_version": "4.28.0.dev0"
+}

models/image_encoder/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ca9667da1ca9e0b0f75e46bb030f7e011f44f86cbfb8d5a36590fcd7507b030
+size 2528373448

models/image_encoder/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d3ec1e66737f77a4f3bc2df3c52eacefc69ce7825e2784183b1d4e9877d9193
+size 2528481905

models/ip-adapter-full-face_sd15.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47ec4644114f3bfe25b2fc830af6b0dd8dcad9a0371a238b9cc919465c60d1dc
+size 43592551

models/ip-adapter-full-face_sd15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4a17fb643bf876235a45a0e87a49da2855be6584b28ca04c62a97ab5ff1c6f3
+size 43592352

models/ip-adapter-plus-face_sd15.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa09c22b49ef63474dcde12f26a35b8b8e9b755b716a553aa29e8dbe8d21e0c9
+size 98183381

models/ip-adapter-plus-face_sd15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c9edc21af6f737dc1d6e0e734190e976cfacf802d6b024b77aa3be922f7569b
+size 98183288

models/ip-adapter-plus_sd15.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1cb77fc0613369b66be1531cc452b823a4af7d87ee56956000a69fc39e3817ba
+size 158033179

models/ip-adapter-plus_sd15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a1c250be40455cc61a43da1201ec3f1edaea71214865fb47f57927e06cbe4996
+size 98183288

models/ip-adapter_sd15.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68e1df30d760f280e578c302f1e73b37ea08654eff16a31153588047affe0058
+size 44642825

models/ip-adapter_sd15.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:289b45f16d043d0bf542e45831f971dcdaabe18b656f11e86d9dfba7e9ee3369
+size 44642768

models/ip-adapter_sd15_light.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f71bfbdd937f2edad0c894ec72d12db02b3be0316f62988e5fc669ca4da6b7e1
+size 44642819

models/ip-adapter_sd15_light.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0747d08db670535bfa286452a77d93cebad5c677b46d038543f9f2de8690bb26
+size 44642768

models/ip-adapter_sd15_vit-G.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1398e9ae37cb65553a8525871830a283914dafd9ec3039716344a826399ec474
+size 46215689

models/ip-adapter_sd15_vit-G.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a26f736af07bb341a83dfea23713531d0575760e8ed947c68cb31a4c62d9c90b
+size 46215640

sdxl_models/image_encoder/config.json ADDED Viewed

	@@ -0,0 +1,81 @@

+{
+  "architectures": [
+    "CLIPVisionModelWithProjection"
+  ],
+  "_name_or_path": "",
+  "add_cross_attention": false,
+  "architectures": null,
+  "attention_dropout": 0.0,
+  "bad_words_ids": null,
+  "begin_suppress_tokens": null,
+  "bos_token_id": null,
+  "chunk_size_feed_forward": 0,
+  "cross_attention_hidden_size": null,
+  "decoder_start_token_id": null,
+  "diversity_penalty": 0.0,
+  "do_sample": false,
+  "dropout": 0.0,
+  "early_stopping": false,
+  "encoder_no_repeat_ngram_size": 0,
+  "eos_token_id": null,
+  "exponential_decay_length_penalty": null,
+  "finetuning_task": null,
+  "forced_bos_token_id": null,
+  "forced_eos_token_id": null,
+  "hidden_act": "gelu",
+  "hidden_size": 1664,
+  "id2label": {
+    "0": "LABEL_0",
+    "1": "LABEL_1"
+      },
+  "image_size": 224,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 8192,
+  "is_decoder": false,
+  "is_encoder_decoder": false,
+  "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1
+      },
+  "layer_norm_eps": 1e-05,
+  "length_penalty": 1.0,
+  "max_length": 20,
+  "min_length": 0,
+  "model_type": "clip_vision_model",
+  "no_repeat_ngram_size": 0,
+  "num_attention_heads": 16,
+  "num_beam_groups": 1,
+  "num_beams": 1,
+  "num_channels": 3,
+  "num_hidden_layers": 48,
+  "num_return_sequences": 1,
+  "output_attentions": false,
+  "output_hidden_states": false,
+  "output_scores": false,
+  "pad_token_id": null,
+  "patch_size": 14,
+  "prefix": null,
+  "problem_type": null,
+  "pruned_heads": {},
+  "remove_invalid_values": false,
+  "repetition_penalty": 1.0,
+  "return_dict": true,
+  "return_dict_in_generate": false,
+  "sep_token_id": null,
+  "suppress_tokens": null,
+  "task_specific_params": null,
+  "temperature": 1.0,
+  "tf_legacy_loss": false,
+  "tie_encoder_decoder": false,
+  "tie_word_embeddings": true,
+  "tokenizer_class": null,
+  "top_k": 50,
+  "top_p": 1.0,
+  "torch_dtype": null,
+  "torchscript": false,
+  "transformers_version": "4.24.0",
+  "typical_p": 1.0,
+  "use_bfloat16": false,
+  "projection_dim": 1280
+}

sdxl_models/image_encoder/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:657723e09f46a7c3957df651601029f66b1748afb12b419816330f16ed45d64d
+size 3689912664

sdxl_models/image_encoder/pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2999562fbc02f9dc0d9c0acb7cf0970ec3a9b2a578d7d05afe82191d606d2d80
+size 3690112753

sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:50e886d82940b3c5873d80c2b06d8a4b0d0fccec70bc44fd53f16ac3cfd7fc36
+size 1013454761

sdxl_models/ip-adapter-plus-face_sdxl_vit-h.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:677ad8860204f7d0bfba12d29e6c31ded9beefdf3e4bbd102518357d31a292c1
+size 847517512

sdxl_models/ip-adapter-plus_sdxl_vit-h.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec70edb7cc8e769c9388d94eeaea3e4526352c9fae793a608782d1d8951fde90
+size 1013454427

sdxl_models/ip-adapter-plus_sdxl_vit-h.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f5062b8400c94b7159665b21ba5c62acdcd7682262743d7f2aefedef00e6581
+size 847517512

sdxl_models/ip-adapter_sdxl.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7525f2731e9e86d1368e0b68467615d55dda459691965bdd7d37fa3d7fd84c12
+size 702585097

sdxl_models/ip-adapter_sdxl.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba1002529e783604c5f326d49f0122025392d1d20ac8d573b3eeb3e6dea4ebb6
+size 702585376

sdxl_models/ip-adapter_sdxl_vit-h.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b382e2501d0ab3fe2e09312e561a59cd3f21262aff25373700e0cd62c635929
+size 698390793

sdxl_models/ip-adapter_sdxl_vit-h.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebf05d918348aec7abb02a5e9ecef77e0aaea6914a5c4ea13f50d45eb1681831
+size 698391064