Sketch-126-DomainNet
Sketch-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify sketches into 126 domain categories using the SiglipForImageClassification architecture.
Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786
Classification Report:
precision recall f1-score support
aircraft_carrier 1.0000 0.2200 0.3607 50
alarm_clock 0.9873 0.9568 0.9718 162
ant 0.9432 0.9326 0.9379 89
anvil 0.2727 0.0423 0.0732 71
asparagus 0.9673 0.8916 0.9279 166
axe 0.8034 0.8773 0.8387 163
banana 0.9744 0.9383 0.9560 162
basket 0.7160 0.7682 0.7412 151
bathtub 0.8073 0.9281 0.8635 167
bear 0.8636 0.6690 0.7540 142
bee 0.9196 0.8957 0.9075 115
bird 0.9094 0.9429 0.9259 245
blackberry 1.0000 0.1250 0.2222 48
blueberry 0.6744 0.8529 0.7532 102
bottlecap 0.7468 0.5315 0.6211 111
broccoli 0.7727 0.9444 0.8500 144
bus 0.9302 0.8989 0.9143 178
butterfly 0.9594 0.9497 0.9545 199
cactus 1.0000 0.6735 0.8049 49
cake 0.0000 0.0000 0.0000 54
calculator 0.9298 0.9636 0.9464 55
camel 0.9208 0.8942 0.9073 104
camera 0.9200 0.7931 0.8519 87
candle 0.9556 0.6935 0.8037 62
cannon 0.7500 0.2027 0.3191 74
canoe 0.8000 0.5825 0.6742 103
carrot 0.0000 0.0000 0.0000 27
castle 0.9583 0.5111 0.6667 45
cat 0.8961 0.6635 0.7624 104
ceiling_fan 0.0000 0.0000 0.0000 20
cell_phone 0.0000 0.0000 0.0000 18
cello 0.9600 0.4706 0.6316 51
chair 0.8043 0.4805 0.6016 77
chandelier 0.0000 0.0000 0.0000 27
coffee_cup 0.0000 0.0000 0.0000 26
compass 0.0000 0.0000 0.0000 10
computer 0.2500 0.0435 0.0741 23
cow 0.0000 0.0000 0.0000 14
crab 0.9123 0.8525 0.8814 122
crocodile 0.9280 0.8992 0.9134 129
cruise_ship 0.7467 0.9032 0.8175 124
dog 0.8533 0.8911 0.8718 248
dolphin 0.9091 0.8824 0.8955 68
dragon 0.7914 0.8269 0.8088 156
drums 0.9259 0.8772 0.9009 171
duck 0.8409 0.8409 0.8409 220
dumbbell 0.9507 0.9184 0.9343 147
elephant 0.9630 0.9765 0.9697 213
eyeglasses 0.8155 0.7919 0.8035 173
feather 0.9344 0.9344 0.9344 244
fence 0.8796 0.8482 0.8636 112
fish 0.9527 0.9495 0.9511 297
flamingo 0.9818 0.9474 0.9643 114
flower 0.8267 0.9219 0.8717 269
foot 0.7743 0.8578 0.8140 204
fork 0.9366 0.9433 0.9399 141
frog 0.9620 0.9383 0.9500 162
giraffe 0.9655 0.9396 0.9524 149
goatee 0.7914 0.8897 0.8377 145
grapes 0.9132 0.9609 0.9364 230
guitar 0.8462 0.9862 0.9108 145
hammer 0.8333 0.4386 0.5747 57
helicopter 0.9441 0.9620 0.9530 158
helmet 0.8509 0.8204 0.8354 167
horse 0.9091 0.9877 0.9467 81
kangaroo 0.9592 0.9691 0.9641 97
lantern 0.0000 0.0000 0.0000 30
laptop 0.8273 0.9200 0.8712 250
leaf 0.8449 0.8870 0.8655 301
lion 0.9697 0.9734 0.9715 263
lipstick 0.9634 0.8977 0.9294 88
lobster 0.9265 0.9130 0.9197 138
microphone 0.8917 0.8770 0.8843 122
monkey 0.9297 0.8947 0.9119 133
mosquito 0.9052 0.9211 0.9130 114
mouse 0.8632 0.8039 0.8325 102
mug 0.6928 0.7737 0.7310 137
mushroom 0.8174 0.8861 0.8504 202
onion 0.9538 0.9841 0.9688 126
panda 0.9643 0.8710 0.9153 62
peanut 0.8302 0.8462 0.8381 104
pear 0.7966 0.9658 0.8731 146
peas 0.6667 0.8438 0.7448 64
pencil 0.0000 0.0000 0.0000 21
penguin 0.9586 0.9701 0.9643 167
pig 0.8983 0.8785 0.8883 181
pillow 0.9570 0.9674 0.9622 92
pineapple 0.9808 0.9714 0.9761 105
potato 0.9444 0.5231 0.6733 65
power_outlet 0.5556 0.0676 0.1205 74
purse 0.9220 0.7182 0.8075 181
rabbit 0.9697 0.8767 0.9209 73
raccoon 0.7850 0.9097 0.8428 277
rhinoceros 0.9863 0.9863 0.9863 146
rifle 0.9143 0.9796 0.9458 98
saxophone 0.9381 0.8618 0.8983 246
screwdriver 0.7709 0.8706 0.8177 286
sea_turtle 0.9698 0.9507 0.9602 203
see_saw 0.3296 0.5738 0.4187 413
sheep 0.9254 0.9153 0.9203 366
shoe 0.9395 0.9688 0.9539 513
skateboard 0.7365 0.7831 0.7591 332
snake 0.8005 0.8737 0.8355 372
speedboat 0.8388 0.8833 0.8605 377
spider 0.7954 0.8696 0.8309 514
squirrel 0.8511 0.8484 0.8498 310
strawberry 0.8313 0.8471 0.8391 157
streetlight 0.7944 0.8134 0.8038 209
string_bean 0.7143 0.3000 0.4225 50
submarine 0.5916 0.6975 0.6402 162
swan 0.8966 0.8387 0.8667 186
table 0.6705 0.7522 0.7090 230
teapot 0.8464 0.8968 0.8709 252
teddy-bear 0.6818 0.8385 0.7521 161
television 0.8974 0.7071 0.7910 99
the_Eiffel_Tower 0.9860 0.9679 0.9769 218
the_Great_Wall_of_China 0.6389 0.8440 0.7273 109
tiger 0.9417 0.9604 0.9510 303
toe 0.0000 0.0000 0.0000 53
train 0.8650 0.9010 0.8827 192
truck 0.8136 0.9372 0.8710 191
umbrella 0.8650 0.8913 0.8779 230
vase 0.8082 0.8082 0.8082 146
watermelon 0.8947 0.8333 0.8629 102
whale 0.8910 0.8744 0.8826 215
zebra 0.9817 0.9727 0.9772 220
accuracy 0.8440 19317
macro avg 0.7818 0.7419 0.7475 19317
weighted avg 0.8404 0.8440 0.8352 19317
The model categorizes images into the following 126 classes:
- Class 0: "aircraft_carrier"
- Class 1: "alarm_clock"
- Class 2: "ant"
- Class 3: "anvil"
- Class 4: "asparagus"
- Class 5: "axe"
- Class 6: "banana"
- Class 7: "basket"
- Class 8: "bathtub"
- Class 9: "bear"
- Class 10: "bee"
- Class 11: "bird"
- Class 12: "blackberry"
- Class 13: "blueberry"
- Class 14: "bottlecap"
- Class 15: "broccoli"
- Class 16: "bus"
- Class 17: "butterfly"
- Class 18: "cactus"
- Class 19: "cake"
- Class 20: "calculator"
- Class 21: "camel"
- Class 22: "camera"
- Class 23: "candle"
- Class 24: "cannon"
- Class 25: "canoe"
- Class 26: "carrot"
- Class 27: "castle"
- Class 28: "cat"
- Class 29: "ceiling_fan"
- Class 30: "cell_phone"
- Class 31: "cello"
- Class 32: "chair"
- Class 33: "chandelier"
- Class 34: "coffee_cup"
- Class 35: "compass"
- Class 36: "computer"
- Class 37: "cow"
- Class 38: "crab"
- Class 39: "crocodile"
- Class 40: "cruise_ship"
- Class 41: "dog"
- Class 42: "dolphin"
- Class 43: "dragon"
- Class 44: "drums"
- Class 45: "duck"
- Class 46: "dumbbell"
- Class 47: "elephant"
- Class 48: "eyeglasses"
- Class 49: "feather"
- Class 50: "fence"
- Class 51: "fish"
- Class 52: "flamingo"
- Class 53: "flower"
- Class 54: "foot"
- Class 55: "fork"
- Class 56: "frog"
- Class 57: "giraffe"
- Class 58: "goatee"
- Class 59: "grapes"
- Class 60: "guitar"
- Class 61: "hammer"
- Class 62: "helicopter"
- Class 63: "helmet"
- Class 64: "horse"
- Class 65: "kangaroo"
- Class 66: "lantern"
- Class 67: "laptop"
- Class 68: "leaf"
- Class 69: "lion"
- Class 70: "lipstick"
- Class 71: "lobster"
- Class 72: "microphone"
- Class 73: "monkey"
- Class 74: "mosquito"
- Class 75: "mouse"
- Class 76: "mug"
- Class 77: "mushroom"
- Class 78: "onion"
- Class 79: "panda"
- Class 80: "peanut"
- Class 81: "pear"
- Class 82: "peas"
- Class 83: "pencil"
- Class 84: "penguin"
- Class 85: "pig"
- Class 86: "pillow"
- Class 87: "pineapple"
- Class 88: "potato"
- Class 89: "power_outlet"
- Class 90: "purse"
- Class 91: "rabbit"
- Class 92: "raccoon"
- Class 93: "rhinoceros"
- Class 94: "rifle"
- Class 95: "saxophone"
- Class 96: "screwdriver"
- Class 97: "sea_turtle"
- Class 98: "see_saw"
- Class 99: "sheep"
- Class 100: "shoe"
- Class 101: "skateboard"
- Class 102: "snake"
- Class 103: "speedboat"
- Class 104: "spider"
- Class 105: "squirrel"
- Class 106: "strawberry"
- Class 107: "streetlight"
- Class 108: "string_bean"
- Class 109: "submarine"
- Class 110: "swan"
- Class 111: "table"
- Class 112: "teapot"
- Class 113: "teddy-bear"
- Class 114: "television"
- Class 115: "the_Eiffel_Tower"
- Class 116: "the_Great_Wall_of_China"
- Class 117: "tiger"
- Class 118: "toe"
- Class 119: "train"
- Class 120: "truck"
- Class 121: "umbrella"
- Class 122: "vase"
- Class 123: "watermelon"
- Class 124: "whale"
- Class 125: "zebra"
Run with Transformers🤗
!pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Sketch-126-DomainNet"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def sketch_classification(image):
"""Predicts the sketch category for an input image."""
# Convert the input numpy array to a PIL Image and ensure it has 3 channels (RGB)
image = Image.fromarray(image).convert("RGB")
# Process the image and prepare it for the model
inputs = processor(images=image, return_tensors="pt")
# Perform inference without gradient calculation
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Convert logits to probabilities using softmax
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
# Mapping from indices to corresponding sketch category labels
labels = {
"0": "aircraft_carrier", "1": "alarm_clock", "2": "ant", "3": "anvil", "4": "asparagus",
"5": "axe", "6": "banana", "7": "basket", "8": "bathtub", "9": "bear",
"10": "bee", "11": "bird", "12": "blackberry", "13": "blueberry", "14": "bottlecap",
"15": "broccoli", "16": "bus", "17": "butterfly", "18": "cactus", "19": "cake",
"20": "calculator", "21": "camel", "22": "camera", "23": "candle", "24": "cannon",
"25": "canoe", "26": "carrot", "27": "castle", "28": "cat", "29": "ceiling_fan",
"30": "cell_phone", "31": "cello", "32": "chair", "33": "chandelier", "34": "coffee_cup",
"35": "compass", "36": "computer", "37": "cow", "38": "crab", "39": "crocodile",
"40": "cruise_ship", "41": "dog", "42": "dolphin", "43": "dragon", "44": "drums",
"45": "duck", "46": "dumbbell", "47": "elephant", "48": "eyeglasses", "49": "feather",
"50": "fence", "51": "fish", "52": "flamingo", "53": "flower", "54": "foot",
"55": "fork", "56": "frog", "57": "giraffe", "58": "goatee", "59": "grapes",
"60": "guitar", "61": "hammer", "62": "helicopter", "63": "helmet", "64": "horse",
"65": "kangaroo", "66": "lantern", "67": "laptop", "68": "leaf", "69": "lion",
"70": "lipstick", "71": "lobster", "72": "microphone", "73": "monkey", "74": "mosquito",
"75": "mouse", "76": "mug", "77": "mushroom", "78": "onion", "79": "panda",
"80": "peanut", "81": "pear", "82": "peas", "83": "pencil", "84": "penguin",
"85": "pig", "86": "pillow", "87": "pineapple", "88": "potato", "89": "power_outlet",
"90": "purse", "91": "rabbit", "92": "raccoon", "93": "rhinoceros", "94": "rifle",
"95": "saxophone", "96": "screwdriver", "97": "sea_turtle", "98": "see_saw", "99": "sheep",
"100": "shoe", "101": "skateboard", "102": "snake", "103": "speedboat", "104": "spider",
"105": "squirrel", "106": "strawberry", "107": "streetlight", "108": "string_bean",
"109": "submarine", "110": "swan", "111": "table", "112": "teapot", "113": "teddy-bear",
"114": "television", "115": "the_Eiffel_Tower", "116": "the_Great_Wall_of_China",
"117": "tiger", "118": "toe", "119": "train", "120": "truck", "121": "umbrella",
"122": "vase", "123": "watermelon", "124": "whale", "125": "zebra"
}
# Create a dictionary mapping each label to its predicted probability (rounded)
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=sketch_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="Sketch-126-DomainNet Classification",
description="Upload a sketch to classify it into one of 126 categories."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
Intended Use:
The Sketch-126-DomainNet model is designed for sketch image classification. It is capable of categorizing sketches into a wide range of domains—from objects like an "aircraft_carrier" or "alarm_clock" to animals, plants, and everyday items. Potential use cases include:
- Art and Design Applications: Assisting artists and designers in organizing and retrieving sketches based on content.
- Creative Search Engines: Enabling sketch-based search for design inspiration.
- Educational Tools: Helping students and educators in art and design fields with categorization and retrieval of visual resources.
- Computer Vision Research: Providing a benchmark dataset for sketch recognition and domain adaptation tasks.
- Downloads last month
- 75
Model tree for prithivMLmods/Sketch-126-DomainNet
Base model
google/siglip2-base-patch16-224