nielsr HF staff commited on
Commit
eeea639
·
1 Parent(s): 732c0b1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # DPT 3.1 (Swinv2 backbone)
6
+
7
+ DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/MiDaS/tree/master).
8
+
9
+ Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team.
10
+
11
+ ## Model description
12
+
13
+ This DPT model uses the [Swinv2](https://huggingface.co/docs/transformers/model_doc/swinv2) model as backbone and adds a neck + head on top for monocular depth estimation.
14
+
15
+ ![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg)
16
+
17
+ ## How to use
18
+
19
+ Here is how to use this model for zero-shot depth estimation on an image:
20
+
21
+ ```python
22
+ from transformers import DPTImageProcessor, DPTForDepthEstimation
23
+ import torch
24
+ import numpy as np
25
+ from PIL import Image
26
+ import requests
27
+
28
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
29
+ image = Image.open(requests.get(url, stream=True).raw)
30
+
31
+ processor = DPTImageProcessor.from_pretrained("Intel/dpt-swinv2-base-384")
32
+ model = DPTForDepthEstimation.from_pretrained("Intel/dpt-swinv2-base-384")
33
+
34
+ # prepare image for the model
35
+ inputs = processor(images=image, return_tensors="pt")
36
+
37
+ with torch.no_grad():
38
+ outputs = model(**inputs)
39
+ predicted_depth = outputs.predicted_depth
40
+
41
+ # interpolate to original size
42
+ prediction = torch.nn.functional.interpolate(
43
+ predicted_depth.unsqueeze(1),
44
+ size=image.size[::-1],
45
+ mode="bicubic",
46
+ align_corners=False,
47
+ )
48
+
49
+ # visualize the prediction
50
+ output = prediction.squeeze().cpu().numpy()
51
+ formatted = (output * 255 / np.max(output)).astype("uint8")
52
+ depth = Image.fromarray(formatted)
53
+ ```
54
+
55
+ or one can use the pipeline API:
56
+
57
+ ```python
58
+ from transformers import pipeline
59
+
60
+ pipe = pipeline(task="depth-estimation", model="Intel/dpt-swinv2-base-384")
61
+ result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg")
62
+ result["depth"]
63
+ ```