🚀 distilbert-based Multilingual Sentiment Classification Model

NEWS!

2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.

Model Details

Model Name: tabularisai/multilingual-sentiment-analysis
Base Model: distilbert/distilbert-base-multilingual-cased
Task: Text Classification (Sentiment Analysis)
Languages: Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).
Number of Classes: 5 (Very Negative, Negative, Neutral, Positive, Very Positive)
Usage:
- Social media analysis
- Customer feedback analysis
- Product reviews classification
- Brand monitoring
- Market research
- Customer service optimization
- Competitive intelligence

Model Description

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.

Training Data

Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.

Training Procedure

Fine-tuned for 3.5 epochs.
Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.

Intended Use

Ideal for:

Multilingual social media monitoring
International customer feedback analysis
Global product review sentiment classification
Worldwide brand sentiment tracking

How to Use

Using pipelines, it takes only 4 lines:

from transformers import pipeline

# Load the classification pipeline with the specified model
pipe = pipeline("text-classification", model="tabularisai/multilingual-sentiment-analysis")

# Classify a new sentence
sentence = "I love this product! It's amazing and works perfectly."
result = pipe(sentence)

# Print the result
print(result)

Below is a Python example on how to use the multilingual sentiment model without pipelines:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(texts):
    inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
    return [sentiment_map[p] for p in torch.argmax(probabilities, dim=-1).tolist()]

texts = [
    # English
    "I absolutely love the new design of this app!", "The customer service was disappointing.", "The weather is fine, nothing special.",
    # Chinese
    "这家餐厅的菜味道非常棒！", "我对他的回答很失望。", "天气今天一般。",
    # Spanish
    "¡Me encanta cómo quedó la decoración!", "El servicio fue terrible y muy lento.", "El libro estuvo más o menos.",
    # Arabic
    "الخدمة في هذا الفندق رائعة جدًا!", "لم يعجبني الطعام في هذا المطعم.", "كانت الرحلة عادية。",
    # Ukrainian
    "Мені дуже сподобалася ця вистава!", "Обслуговування було жахливим.", "Книга була посередньою。",
    # Hindi
    "यह जगह सच में अद्भुत है!", "यह अनुभव बहुत खराब था।", "फिल्म ठीक-ठाक थी।",
    # Bengali
    "এখানকার পরিবেশ অসাধারণ!", "সেবার মান একেবারেই খারাপ।", "খাবারটা মোটামুটি ছিল।",
    # Portuguese
    "Este livro é fantástico! Eu aprendi muitas coisas novas e inspiradoras.", 
    "Não gostei do produto, veio quebrado.", "O filme foi ok, nada de especial.",
    # Japanese
    "このレストランの料理は本当に美味しいです！", "このホテルのサービスはがっかりしました。", "天気はまあまあです。",
    # Russian
    "Я в восторге от этого нового гаджета!", "Этот сервис оставил у меня только разочарование.", "Встреча была обычной, ничего особенного.",
    # French
    "J'adore ce restaurant, c'est excellent !", "L'attente était trop longue et frustrante.", "Le film était moyen, sans plus.",
    # Turkish
    "Bu otelin manzarasına bayıldım!", "Ürün tam bir hayal kırıklığıydı.", "Konser fena değildi, ortalamaydı.",
    # Italian
    "Adoro questo posto, è fantastico!", "Il servizio clienti è stato pessimo.", "La cena era nella media.",
    # Polish
    "Uwielbiam tę restaurację, jedzenie jest świetne!", "Obsługa klienta była rozczarowująca.", "Pogoda jest w porządku, nic szczególnego.",
    # Tagalog
    "Ang ganda ng lugar na ito, sobrang aliwalas!", "Hindi maganda ang serbisyo nila dito.", "Maayos lang ang palabas, walang espesyal.",
    # Dutch
    "Ik ben echt blij met mijn nieuwe aankoop!", "De klantenservice was echt slecht.", "De presentatie was gewoon oké, niet bijzonder.",
    # Malay
    "Saya suka makanan di sini, sangat sedap!", "Pengalaman ini sangat mengecewakan.", "Hari ini cuacanya biasa sahaja.",
    # Korean
    "이 가게의 케이크는 정말 맛있어요!", "서비스가 너무 별로였어요.", "날씨가 그저 그렇네요.",
    # Swiss German
    "Ich find dä Service i de Beiz mega guet!", "Däs Esä het mir nöd gfalle.", "D Wätter hüt isch so naja."
]

for text, sentiment in zip(texts, predict_sentiment(texts)):
    print(f"Text: {text}\nSentiment: {sentiment}\n")

Ethical Considerations

Synthetic data reduces bias, but validation in real-world scenarios is advised.

Citation

Will be included.

Contact

For inquiries, data, private APIs, better models, contact [email protected]

tabularis.ai

tabularisai
/

multilingual-sentiment-analysis