Model Card for DistilBERT Fine-Tuned on IMDB Sentiment Analysis

Model Details

Model Description

This model is a fine-tuned version of distilbert-base-uncased on the IMDB movie reviews dataset for binary sentiment classification (positive vs. negative). The model has been trained to classify movie reviews into either positive (1) or negative (0) sentiments.

Developed by: Nikke Salonen
Finetuned from model: distilbert-base-uncased
Language(s): English
License: Apache 2.0

Model Sources

Repository: https://huggingface.co./NikkeS/imdb-distilbert/
Dataset: IMDB Dataset

Uses

Direct Use

Sentiment analysis of English text reviews.
Can be used for opinion mining on movie reviews and similar datasets.

Downstream Use

Can be fine-tuned further for sentiment classification in other domains (e.g., product reviews, social media sentiment analysis).

Out-of-Scope Use

Not suitable for languages other than English.
Not recommended for high-stakes decision-making without human oversight.

Bias, Risks, and Limitations

The model is trained on IMDB reviews, so it may not generalize well to other types of sentiment analysis tasks.
May exhibit biases present in the training data.
Sentiment classification depends heavily on context, and the model may misinterpret sarcasm or complex sentences.

Recommendations

Users should evaluate the model on their specific datasets before deploying in production.
If biases are detected, consider fine-tuning on a more diverse dataset.

How to Use the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the fine-tuned model from Hugging Face Hub
model = AutoModelForSequenceClassification.from_pretrained("your-hf-username/imdb-distilbert")
tokenizer = AutoTokenizer.from_pretrained("your-hf-username/imdb-distilbert")

def predict_sentiment(review):
    inputs = tokenizer(review, return_tensors="pt", truncation=True, padding=True, max_length=256)
    with torch.no_grad():
        logits = model(**inputs).logits
    prediction = torch.argmax(logits, dim=1).item()
    return "Positive" if prediction == 1 else "Negative"

# Example Usage
print(predict_sentiment("This movie was absolutely fantastic!"))
print(predict_sentiment("The acting was terrible, and the story made no sense."))

Training Details

Training Data

The model was fine-tuned on the IMDB dataset (50,000 labeled movie reviews).
The dataset is balanced (25,000 positive and 25,000 negative reviews).
The training split consisted of 40,000 samples, while 5,000 samples were used for validation.

Training Procedure

Preprocessing

Tokenized using distilbert-base-uncased tokenizer.
Applied dynamic padding, truncation, and a max sequence length of 256.

Training Hyperparameters

Learning rate: 5e-5
Batch size: 16
Epochs: 2
Optimizer: AdamW
Loss Function: Cross-Entropy Loss

Compute Infrastructure

Hardware: Google Colab T4 GPU
Precision: Mixed precision (fp16=True for efficiency)

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a 5,000-sample test set from the IMDB dataset.

Metrics

Accuracy: 90,4%
Precision, Recall, F1-score:
- Precision: 92,1%
- Recall: 88.2%
- F1-score: 90.0%

Model Examination

The model performs well on general sentiment classification but may struggle with sarcasm, irony, or very short reviews.

Environmental Impact

Hardware Type: Google Colab T4 GPU
Training Time: ~1 hour
CO2 Emission Estimate: Use ML Impact Calculator

Citation

If you use this model, please cite:

@article{salonen2025imdb-distilbert,
  title={Fine-tuned DistilBERT for Sentiment Analysis on IMDB Reviews},
  author={Nikke Salonen},
  year={2025}
}

More Information

Hugging Face Model Page: https://huggingface.co./NikkeS/imdb-distilbert/.
Dataset: IMDB Dataset

Model Card Authors

[Nikke Salonen]

Contact

For questions or issues, contact [email protected].

This model card provides all necessary details, including training info, evaluation results, and usage instructions. Let me know if you'd like any modifications before uploading to Hugging Face Hub!

NikkeS
/

imdb-distilbert