--- license: mit datasets: - Yelp/yelp_review_full metrics: - accuracy base_model: - distilbert/distilbert-base-uncased library_name: transformers tags: - Sentiment Analysis - Text Classification - BERT - Yelp Reviews - Fine-tuned --- # Yelp Review Classifier This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is **star ratings (1 to 5 stars)**. The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co./distilbert/distilbert-base-uncased) from Hugging Face, and trained on a Yelp reviews dataset. ## Model Details - **Model Type**: DistilBERT-based model for sequence classification - **Model Architecture**: `distilbert-base-uncased` - **Number of Parameters**: Approximately 66M parameters - **Training Dataset**: The model was trained on a curated Yelp reviews dataset, labeled for star ratings (1 to 5 stars). - **Fine-Tuning Task**: Multi-class classification for Yelp reviews, predicting the star rating (from 1 to 5 stars) based on the content of the review. ## Training Data - **Dataset**: Custom Yelp reviews dataset - **Data Description**: The dataset consists of Yelp reviews, labeled for star ratings (1 to 5 stars). - **Preprocessing**: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs. ## Training Details - **Training Framework**: Hugging Face Transformers and PyTorch - **Learning Rate**: 2e-5 - **Epochs**: 6 - **Batch Size**: 16 - **Optimizer**: AdamW - **Training Time**: Approximately 2 hours on a GPU ## Usage To use the model for inference, you can use the following code: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load the fine-tuned model and tokenizer from Hugging Face model_name = "kmack/YELP-Review_Classifier" # Replace with your model name if different model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # List of reviews for prediction reviews = [ "The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!", "It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.", "I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon." ] # Map prediction to star ratings label_map = { 0: "1 Star", 1: "2 Stars", 2: "3 Stars", 3: "4 Stars", 4: "5 Stars" } # Iterate over each review and get the prediction for review in reviews: # Tokenize the input text inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) # Get the predicted label (0 to 4 for star ratings) prediction = torch.argmax(outputs.logits, dim=-1).item() # Map prediction to star rating predicted_rating = label_map[prediction] print(f"Rating: {predicted_rating}\n") ``` ## Citation If you use this model in your research, please cite the following: ```@misc{YELP-Review_Classifier, author = {Kmack}, title = {YELP-Review_Classifier}, year = {2024}, url = {https://huggingface.co./kmack/YELP-Review_Classifier} } ```