|
--- |
|
license: mit |
|
--- |
|
|
|
# DPT 3.1 (Swinv2 backbone) |
|
|
|
DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/MiDaS/tree/master). |
|
|
|
Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team. |
|
|
|
## Model description |
|
|
|
This DPT model uses the [Swinv2](https://huggingface.co./docs/transformers/model_doc/swinv2) model as backbone and adds a neck + head on top for monocular depth estimation. |
|
|
|
![model image](https://huggingface.co./datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg) |
|
|
|
## How to use |
|
|
|
Here is how to use this model for zero-shot depth estimation on an image: |
|
|
|
```python |
|
from transformers import DPTImageProcessor, DPTForDepthEstimation |
|
import torch |
|
import numpy as np |
|
from PIL import Image |
|
import requests |
|
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
processor = DPTImageProcessor.from_pretrained("Intel/dpt-swinv2-base-384") |
|
model = DPTForDepthEstimation.from_pretrained("Intel/dpt-swinv2-base-384") |
|
|
|
# prepare image for the model |
|
inputs = processor(images=image, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
predicted_depth = outputs.predicted_depth |
|
|
|
# interpolate to original size |
|
prediction = torch.nn.functional.interpolate( |
|
predicted_depth.unsqueeze(1), |
|
size=image.size[::-1], |
|
mode="bicubic", |
|
align_corners=False, |
|
) |
|
|
|
# visualize the prediction |
|
output = prediction.squeeze().cpu().numpy() |
|
formatted = (output * 255 / np.max(output)).astype("uint8") |
|
depth = Image.fromarray(formatted) |
|
``` |
|
|
|
or one can use the pipeline API: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline(task="depth-estimation", model="Intel/dpt-swinv2-base-384") |
|
result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg") |
|
result["depth"] |
|
``` |