kmack's picture
Update README.md
87d9d01 verified
metadata
license: mit
datasets:
  - Yelp/yelp_review_full
metrics:
  - accuracy
base_model:
  - distilbert/distilbert-base-uncased
library_name: transformers
tags:
  - Sentiment Analysis
  - Text Classification
  - BERT
  - Yelp Reviews
  - Fine-tuned

Yelp Review Classifier

This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is star ratings (1 to 5 stars). The model was fine-tuned using the distilbert-base-uncased model architecture, based on the DistilBERT model from Hugging Face, and trained on a Yelp reviews dataset.

Model Details

  • Model Type: DistilBERT-based model for sequence classification
  • Model Architecture: distilbert-base-uncased
  • Number of Parameters: Approximately 66M parameters
  • Training Dataset: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars).
  • Fine-Tuning Task: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review.

Training Data

  • Dataset: Custom Yelp reviews dataset
  • Data Description: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars).
  • Preprocessing: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.

Training Details

  • Training Framework: Hugging Face Transformers and PyTorch
  • Learning Rate: 2e-5
  • Epochs: 6
  • Batch Size: 16
  • Optimizer: AdamW
  • Training Time: Approximately 2 hours on a GPU

Usage

To use the model for inference, you can use the following code:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the fine-tuned model and tokenizer from Hugging Face
model_name = "kmack/YELP-Review_Classifier"  # Replace with your model name if different
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# List of reviews for prediction
reviews = [
    "The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!",
    "It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.",
    "I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon."
]

# Map prediction to star ratings
label_map = {
    0: "1 Star",
    1: "2 Stars",
    2: "3 Stars",
    3: "4 Stars",
    4: "5 Stars"
}

# Iterate over each review and get the prediction
for review in reviews:
    # Tokenize the input text
    inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)

    # Get predictions
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the predicted label (0 to 4 for star ratings)
    prediction = torch.argmax(outputs.logits, dim=-1).item()

    # Map prediction to star rating
    predicted_rating = label_map[prediction]

    print(f"Rating: {predicted_rating}\n")

Citation

If you use this model in your research, please cite the following:

  author = {Kmack},
  title = {YELP-Review_Classifier},
  year = {2024},
  url = {https://huggingface.co./kmack/YELP-Review_Classifier}
}