Model Card for RoBERTa Toxicity Classifier

This model is a fine-tuned version of RoBERTa-base for toxicity classification, capable of identifying six different types of toxic content in text.

Model Details

Model Description

This model is a fine-tuned version of RoBERTa-base, trained to identify toxic content across multiple categories. It was developed to help identify and moderate harmful content in text data.

Developed by: Bonnavaud Laura, Cousseau Martin, Laborde Stanislas, Rady Othmane, Satouri Amani
Model type: RoBERTa-based text classification
Language(s): English
License: MIT
Finetuned from model: facebook/roberta-base

Uses

Direct Use

The model can be used directly for:

Content moderation
Toxic comment detection
Online safety monitoring
Comment filtering systems

Out-of-Scope Use

This model should not be used for:

Legal decision making
Automated content removal without human review
Processing non-English content
Making definitive judgments about individuals or groups

Bias, Risks, and Limitations

The model may reflect biases present in the training data
Performance may vary across different demographics and contexts
False positives/negatives can occur and should be considered in deployment
Not suitable for high-stakes decisions without human oversight

Recommendations

Users should:

Implement human review processes alongside model predictions
Monitor model performance across different demographic groups
Use confidence thresholds appropriate for their use case
Be transparent about the use of automated toxicity detection

Training Details

Training Data

The model was trained on the Jigsaw Toxic Comment Classification Challenge dataset, which includes comments labeled for toxic content across six categories:

Toxic
Severe Toxic
Obscene
Threat
Insult
Identity Hate

The dataset was split into training and testing sets with a 90-10 split ratio, using stratified sampling based on the sum of toxic labels to ensure balanced distribution. Empty comments were handled by filling with empty strings, and all texts were properly cleaned and tokenized in batches of 48 samples.

Training Procedure

Training Hyperparameters

Training regime: FP16 mixed precision
Optimizer: AdamW
Learning rate: 2e-5
Batch size: 320
Epochs: Up to 40 with early stopping (patience=15)
Max sequence length: 128
Warmup ratio: 0.1
Weight decay: 0.1
Gradient accumulation steps: 2
Scheduler: Linear
DataLoader workers: 2

Evaluation

Testing Data, Factors & Metrics

The model was evaluated on a held-out test set from the Jigsaw dataset.

Metrics

The model was evaluated using comprehensive metrics for multi-label classification:

Per class metrics:

Accuracy
Precision
Recall
F1 Score

Aggregate metrics:

Overall accuracy
Macro-averaged metrics:
- Macro Precision
- Macro Recall
- Macro F1
Micro-averaged metrics:
- Micro Precision
- Micro Recall
- Micro F1

Best model selection was based on F1 score during training.

Environmental Impact

Hardware Type: 4x NVIDIA A10 24GB
Training hours: 20 Minutes
Cloud Provider: ESIEA Cluster

Technical Specifications

Model Architecture and Technical Details

Base model: RoBERTa-base
Problem type: Multi-label classification
Number of labels: 6
Output layers: Linear classification head for multi-label prediction
Number of parameters: ~125M
Training optimizations:
- Distributed Data Parallel (DDP) support with NCCL backend
- FP16 mixed precision training
- Memory optimizations:
  - Gradient accumulation (steps=2)
  - DataLoader pinned memory
  - Efficient batch processing
Caching system for tokenized data to improve training efficiency

Hardware Requirements

Minimum requirements for inference:

RAM: 4GB
CPU: Modern processor supporting AVX instructions
GPU: Optional, but recommended for batch processing

TheRealM4rtin
/

roBERToxico