citizenlab/distilbert-base-multilingual-cased-toxicity

This is multilingual Distil-Bert model sequence classifier trained based on JIGSAW Toxic Comment Classification Challenge dataset.

How to use it

from transformers import pipeline

model_path = "citizenlab/distilbert-base-multilingual-cased-toxicity"

toxicity_classifier = pipeline("text-classification", model=model_path, tokenizer=model_path)
toxicity_classifier("this is a lovely message")
> [{'label': 'not_toxic', 'score': 0.9954179525375366}]

toxicity_classifier("you are an idiot and you and your family should go back to your country")
> [{'label': 'toxic', 'score': 0.9948776960372925}]

Evaluation

Accuracy

  Accuracy Score = 0.9425
F1 Score (Micro) = 0.9450549450549449
F1 Score (Macro) = 0.8491432341169309
Downloads last month
2,782
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for citizenlab/distilbert-base-multilingual-cased-toxicity

Adapters
4 models

Dataset used to train citizenlab/distilbert-base-multilingual-cased-toxicity

Spaces using citizenlab/distilbert-base-multilingual-cased-toxicity 4