--- license: mit --- # DPT 3.1 (Swinv2 backbone) DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/MiDaS/tree/master). Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team. ## Model description This DPT model uses the [Swinv2](https://huggingface.co./docs/transformers/model_doc/swinv2) model as backbone and adds a neck + head on top for monocular depth estimation. ![model image](https://huggingface.co./datasets/huggingface/documentation-images/resolve/main/dpt_architecture.jpg) ## How to use Here is how to use this model for zero-shot depth estimation on an image: ```python from transformers import DPTImageProcessor, DPTForDepthEstimation import torch import numpy as np from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = DPTImageProcessor.from_pretrained("Intel/dpt-swinv2-base-384") model = DPTForDepthEstimation.from_pretrained("Intel/dpt-swinv2-base-384") # prepare image for the model inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) predicted_depth = outputs.predicted_depth # interpolate to original size prediction = torch.nn.functional.interpolate( predicted_depth.unsqueeze(1), size=image.size[::-1], mode="bicubic", align_corners=False, ) # visualize the prediction output = prediction.squeeze().cpu().numpy() formatted = (output * 255 / np.max(output)).astype("uint8") depth = Image.fromarray(formatted) ``` or one can use the pipeline API: ```python from transformers import pipeline pipe = pipeline(task="depth-estimation", model="Intel/dpt-swinv2-base-384") result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg") result["depth"] ```