metadata
license: mit
datasets:
- Yelp/yelp_review_full
metrics:
- accuracy
base_model:
- distilbert/distilbert-base-uncased
library_name: transformers
tags:
- Sentiment Analysis
- Text Classification
- BERT
- Yelp Reviews
- Fine-tuned
Yelp Review Classifier
This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is star ratings (1 to 5 stars). The model was fine-tuned using the distilbert-base-uncased
model architecture, based on the DistilBERT model from Hugging Face, and trained on a Yelp reviews dataset.
Model Details
- Model Type: DistilBERT-based model for sequence classification
- Model Architecture:
distilbert-base-uncased
- Number of Parameters: Approximately 66M parameters
- Training Dataset: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars).
- Fine-Tuning Task: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review.
Training Data
- Dataset: Custom Yelp reviews dataset
- Data Description: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars).
- Preprocessing: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.
Training Details
- Training Framework: Hugging Face Transformers and PyTorch
- Learning Rate: 2e-5
- Epochs: 6
- Batch Size: 16
- Optimizer: AdamW
- Training Time: Approximately 2 hours on a GPU
Usage
To use the model for inference, you can use the following code:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the fine-tuned model and tokenizer from Hugging Face
model_name = "kmack/YELP-Review_Classifier" # Replace with your model name if different
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# List of reviews for prediction
reviews = [
"The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!",
"It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.",
"I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon."
]
# Map prediction to star ratings
label_map = {
0: "1 Star",
1: "2 Stars",
2: "3 Stars",
3: "4 Stars",
4: "5 Stars"
}
# Iterate over each review and get the prediction
for review in reviews:
# Tokenize the input text
inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
# Get the predicted label (0 to 4 for star ratings)
prediction = torch.argmax(outputs.logits, dim=-1).item()
# Map prediction to star rating
predicted_rating = label_map[prediction]
print(f"Rating: {predicted_rating}\n")
Citation
If you use this model in your research, please cite the following:
author = {Kmack},
title = {YELP-Review_Classifier},
year = {2024},
url = {https://huggingface.co./kmack/YELP-Review_Classifier}
}