Multi-Source Experimentals
Collection
Domain Specific Classification Models : SigLIP2
•
8 items
•
Updated
•
1
Hand-Gesture-19 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify hand gesture images into different categories using the SiglipForImageClassification architecture.
Classification Report:
precision recall f1-score support
call 0.9889 0.9739 0.9813 6939
dislike 0.9892 0.9863 0.9877 7028
fist 0.9956 0.9923 0.9940 6882
four 0.9632 0.9653 0.9643 7183
like 0.9668 0.9855 0.9760 6823
mute 0.9848 0.9976 0.9912 7139
no_gesture 0.9960 0.9957 0.9958 27823
ok 0.9872 0.9831 0.9852 6924
one 0.9817 0.9854 0.9835 7062
palm 0.9793 0.9848 0.9820 7050
peace 0.9723 0.9635 0.9679 6965
peace_inverted 0.9806 0.9836 0.9821 6876
rock 0.9853 0.9865 0.9859 6883
stop 0.9614 0.9901 0.9756 6893
stop_inverted 0.9933 0.9712 0.9821 7142
three 0.9712 0.9478 0.9594 6940
three2 0.9785 0.9799 0.9792 6870
two_up 0.9848 0.9863 0.9855 7346
two_up_inverted 0.9855 0.9871 0.9863 6967
accuracy 0.9833 153735
macro avg 0.9813 0.9814 0.9813 153735
weighted avg 0.9833 0.9833 0.9833 153735
The model categorizes images into nineteen hand gestures:
!pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor
from transformers import SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Hand-Gesture-19"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def hand_gesture_classification(image):
"""Predicts the hand gesture category from an image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
labels = {
"0": "call",
"1": "dislike",
"2": "fist",
"3": "four",
"4": "like",
"5": "mute",
"6": "no_gesture",
"7": "ok",
"8": "one",
"9": "palm",
"10": "peace",
"11": "peace_inverted",
"12": "rock",
"13": "stop",
"14": "stop_inverted",
"15": "three",
"16": "three2",
"17": "two_up",
"18": "two_up_inverted"
}
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=hand_gesture_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="Hand Gesture Classification",
description="Upload an image to classify the hand gesture."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
The Hand-Gesture-19 model is designed to classify hand gesture images into different categories. Potential use cases include:
Base model
google/siglip2-so400m-patch14-384