Human Presence Classification

CLIP-Based Linear Probe Logistic Regression classification model to detect the presence of humans in fashion-domain images.

@author: Adham Elarabawy (www.adhamelarabawy.com)

Overview

I needed a human presence classification model to help with structuring a very large scraped dataset of fashion imagery. CLIP-based similarity scoring was not sufficient, since desired precision would result in a substantial drop rate. I trained a logistic model on top of CLIP image features as a linear probe for classification, using DeepFashion paired images. Achieved 100% accuracy on the test set (20% = ~2k imgs). Definitely overfit to fashion imagery, but that's fine since that's the downstream use case. This is extremely low latency, especially if you've already encoded your images using ViT-B/32 CLIP variant.

On an A10, it takes about ~23 milliseconds to encode the image, and ~0.28 milliseconds to classify the features.

Dataset

I used a subset of DeepFashion v1 in order to curate a dataset of paired images of a garment and then the garment on a person. I then used this structuring to create the final dataset with binary labels of human presence. Some notes:

  • The images seem to be predominantly women.
  • The human models seem to have good coverage on most ethnicities/body types. Early analysis also shows that there is not any ethnicity/body type bias.
  • Most/all the images have a white background. From my testing, the model generalizes quite well to other domains (with natural/diverse backgrounds/poses).
  • My hypothesis is that the paired nature of the data allowed the model to pick up on the correct features, which has made it very robust.
    Presence Case Absence Case

Usage:

import clip
import torch
import pickle
import sklearn
import time
from PIL import Image
from huggingface_hub import hf_hub_download

device = "cuda" if torch.cuda.is_available() else "cpu"
clip_model, clip_preprocess = clip.load("ViT-B/32", device)

repo_id = "adhamelarabawy/fashion_human_classifier"
model_path = hf_hub_download(repo_id=repo_id, filename="model.pkl")

with open(model_path, 'rb') as file:
    human_classifier = pickle.load(file)

# time the prediction
start = time.time()
features = clip_model.encode_image(clip_preprocess(img).unsqueeze(0).to(device)).detach().cpu().numpy()
encode_time = time.time() - start
pred = human_classifier.predict(features) # True = has human, False = no human
pred_time = time.time() - encode_time - start

print(f"Encode time: {encode_time*1000:.3f} milliseconds")
print(f"Prediction time: {pred_time*1000:.3f} milliseconds")
print(f"Prediction (has_human): {pred}")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train adhamelarabawy/human_presence_classifier