Model Card for DistilBERT Fine-Tuned on IMDB Sentiment Analysis

Model Details

Model Description

This model is a fine-tuned version of distilbert-base-uncased on the IMDB movie reviews dataset for binary sentiment classification (positive vs. negative). The model has been trained to classify movie reviews into either positive (1) or negative (0) sentiments.

  • Developed by: Nikke Salonen
  • Finetuned from model: distilbert-base-uncased
  • Language(s): English
  • License: Apache 2.0

Model Sources

Uses

Direct Use

  • Sentiment analysis of English text reviews.
  • Can be used for opinion mining on movie reviews and similar datasets.

Downstream Use

  • Can be fine-tuned further for sentiment classification in other domains (e.g., product reviews, social media sentiment analysis).

Out-of-Scope Use

  • Not suitable for languages other than English.
  • Not recommended for high-stakes decision-making without human oversight.

Bias, Risks, and Limitations

  • The model is trained on IMDB reviews, so it may not generalize well to other types of sentiment analysis tasks.
  • May exhibit biases present in the training data.
  • Sentiment classification depends heavily on context, and the model may misinterpret sarcasm or complex sentences.

Recommendations

  • Users should evaluate the model on their specific datasets before deploying in production.
  • If biases are detected, consider fine-tuning on a more diverse dataset.

How to Use the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the fine-tuned model from Hugging Face Hub
model = AutoModelForSequenceClassification.from_pretrained("your-hf-username/imdb-distilbert")
tokenizer = AutoTokenizer.from_pretrained("your-hf-username/imdb-distilbert")

def predict_sentiment(review):
    inputs = tokenizer(review, return_tensors="pt", truncation=True, padding=True, max_length=256)
    with torch.no_grad():
        logits = model(**inputs).logits
    prediction = torch.argmax(logits, dim=1).item()
    return "Positive" if prediction == 1 else "Negative"

# Example Usage
print(predict_sentiment("This movie was absolutely fantastic!"))
print(predict_sentiment("The acting was terrible, and the story made no sense."))

Training Details

Training Data

  • The model was fine-tuned on the IMDB dataset (50,000 labeled movie reviews).
  • The dataset is balanced (25,000 positive and 25,000 negative reviews).
  • The training split consisted of 40,000 samples, while 5,000 samples were used for validation.

Training Procedure

Preprocessing

  • Tokenized using distilbert-base-uncased tokenizer.
  • Applied dynamic padding, truncation, and a max sequence length of 256.

Training Hyperparameters

  • Learning rate: 5e-5
  • Batch size: 16
  • Epochs: 2
  • Optimizer: AdamW
  • Loss Function: Cross-Entropy Loss

Compute Infrastructure

  • Hardware: Google Colab T4 GPU
  • Precision: Mixed precision (fp16=True for efficiency)

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • The model was evaluated on a 5,000-sample test set from the IMDB dataset.

Metrics

  • Accuracy: 90,4%
  • Precision, Recall, F1-score:
    • Precision: 92,1%
    • Recall: 88.2%
    • F1-score: 90.0%

Model Examination

  • The model performs well on general sentiment classification but may struggle with sarcasm, irony, or very short reviews.

Environmental Impact

Citation

If you use this model, please cite:

@article{salonen2025imdb-distilbert,
  title={Fine-tuned DistilBERT for Sentiment Analysis on IMDB Reviews},
  author={Nikke Salonen},
  year={2025}
}

More Information

Model Card Authors

  • [Nikke Salonen]

Contact

For questions or issues, contact [email protected].


This model card provides all necessary details, including training info, evaluation results, and usage instructions. Let me know if you'd like any modifications before uploading to Hugging Face Hub!

Downloads last month
35
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for NikkeS/imdb-distilbert

Finetuned
(7831)
this model

Dataset used to train NikkeS/imdb-distilbert