๐Ÿš€ distilbert-based Multilingual Sentiment Classification Model

NEWS!

  • 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.

Model Details

  • Model Name: tabularisai/multilingual-sentiment-analysis
  • Base Model: distilbert/distilbert-base-multilingual-cased
  • Task: Text Classification (Sentiment Analysis)
  • Languages: Supports English plus Chinese (ไธญๆ–‡), Spanish (Espaรฑol), Hindi (เคนเคฟเคจเฅเคฆเฅ€), Arabic (ุงู„ุนุฑุจูŠุฉ), Bengali (เฆฌเฆพเฆ‚เฆฒเฆพ), Portuguese (Portuguรชs), Russian (ะ ัƒััะบะธะน), Japanese (ๆ—ฅๆœฌ่ชž), German (Deutsch), Malay (Bahasa Melayu), Telugu (เฐคเฑ†เฐฒเฑเฐ—เฑ), Vietnamese (Tiแบฟng Viแป‡t), Korean (ํ•œ๊ตญ์–ด), French (Franรงais), Turkish (Tรผrkรงe), Italian (Italiano), Polish (Polski), Ukrainian (ะฃะบั€ะฐั—ะฝััŒะบะฐ), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).
  • Number of Classes: 5 (Very Negative, Negative, Neutral, Positive, Very Positive)
  • Usage:
    • Social media analysis
    • Customer feedback analysis
    • Product reviews classification
    • Brand monitoring
    • Market research
    • Customer service optimization
    • Competitive intelligence

Model Description

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.

Training Data

Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.

Training Procedure

  • Fine-tuned for 3.5 epochs.
  • Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.

Intended Use

Ideal for:

  • Multilingual social media monitoring
  • International customer feedback analysis
  • Global product review sentiment classification
  • Worldwide brand sentiment tracking

How to Use

Using pipelines, it takes only 4 lines:

from transformers import pipeline

# Load the classification pipeline with the specified model
pipe = pipeline("text-classification", model="tabularisai/multilingual-sentiment-analysis")

# Classify a new sentence
sentence = "I love this product! It's amazing and works perfectly."
result = pipe(sentence)

# Print the result
print(result)

Below is a Python example on how to use the multilingual sentiment model without pipelines:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(texts):
    inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
    sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
    return [sentiment_map[p] for p in torch.argmax(probabilities, dim=-1).tolist()]

texts = [
    # English
    "I absolutely love the new design of this app!", "The customer service was disappointing.", "The weather is fine, nothing special.",
    # Chinese
    "่ฟ™ๅฎถ้คๅŽ…็š„่œๅ‘ณ้“้žๅธธๆฃ’๏ผ", "ๆˆ‘ๅฏนไป–็š„ๅ›ž็ญ”ๅพˆๅคฑๆœ›ใ€‚", "ๅคฉๆฐ”ไปŠๅคฉไธ€่ˆฌใ€‚",
    # Spanish
    "ยกMe encanta cรณmo quedรณ la decoraciรณn!", "El servicio fue terrible y muy lento.", "El libro estuvo mรกs o menos.",
    # Arabic
    "ุงู„ุฎุฏู…ุฉ ููŠ ู‡ุฐุง ุงู„ูู†ุฏู‚ ุฑุงุฆุนุฉ ุฌุฏู‹ุง!", "ู„ู… ูŠุนุฌุจู†ูŠ ุงู„ุทุนุงู… ููŠ ู‡ุฐุง ุงู„ู…ุทุนู….", "ูƒุงู†ุช ุงู„ุฑุญู„ุฉ ุนุงุฏูŠุฉใ€‚",
    # Ukrainian
    "ะœะตะฝั– ะดัƒะถะต ัะฟะพะดะพะฑะฐะปะฐัั ั†ั ะฒะธัั‚ะฐะฒะฐ!", "ะžะฑัะปัƒะณะพะฒัƒะฒะฐะฝะฝั ะฑัƒะปะพ ะถะฐั…ะปะธะฒะธะผ.", "ะšะฝะธะณะฐ ะฑัƒะปะฐ ะฟะพัะตั€ะตะดะฝัŒะพัŽใ€‚",
    # Hindi
    "เคฏเคน เคœเค—เคน เคธเคš เคฎเฅ‡เค‚ เค…เคฆเฅเคญเฅเคค เคนเฅˆ!", "เคฏเคน เค…เคจเฅเคญเคต เคฌเคนเฅเคค เค–เคฐเคพเคฌ เคฅเคพเฅค", "เคซเคฟเคฒเฅเคฎ เค เฅ€เค•-เค เคพเค• เคฅเฅ€เฅค",
    # Bengali
    "เฆเฆ–เฆพเฆจเฆ•เฆพเฆฐ เฆชเฆฐเฆฟเฆฌเง‡เฆถ เฆ…เฆธเฆพเฆงเฆพเฆฐเฆฃ!", "เฆธเง‡เฆฌเฆพเฆฐ เฆฎเฆพเฆจ เฆเฆ•เง‡เฆฌเฆพเฆฐเง‡เฆ‡ เฆ–เฆพเฆฐเฆพเฆชเฅค", "เฆ–เฆพเฆฌเฆพเฆฐเฆŸเฆพ เฆฎเง‹เฆŸเฆพเฆฎเงเฆŸเฆฟ เฆ›เฆฟเฆฒเฅค",
    # Portuguese
    "Este livro รฉ fantรกstico! Eu aprendi muitas coisas novas e inspiradoras.", 
    "Nรฃo gostei do produto, veio quebrado.", "O filme foi ok, nada de especial.",
    # Japanese
    "ใ“ใฎใƒฌใ‚นใƒˆใƒฉใƒณใฎๆ–™็†ใฏๆœฌๅฝ“ใซ็พŽๅ‘ณใ—ใ„ใงใ™๏ผ", "ใ“ใฎใƒ›ใƒ†ใƒซใฎใ‚ตใƒผใƒ“ใ‚นใฏใŒใฃใ‹ใ‚Šใ—ใพใ—ใŸใ€‚", "ๅคฉๆฐ—ใฏใพใ‚ใพใ‚ใงใ™ใ€‚",
    # Russian
    "ะฏ ะฒ ะฒะพัั‚ะพั€ะณะต ะพั‚ ัั‚ะพะณะพ ะฝะพะฒะพะณะพ ะณะฐะดะถะตั‚ะฐ!", "ะญั‚ะพั‚ ัะตั€ะฒะธั ะพัั‚ะฐะฒะธะป ัƒ ะผะตะฝั ั‚ะพะปัŒะบะพ ั€ะฐะทะพั‡ะฐั€ะพะฒะฐะฝะธะต.", "ะ’ัั‚ั€ะตั‡ะฐ ะฑั‹ะปะฐ ะพะฑั‹ั‡ะฝะพะน, ะฝะธั‡ะตะณะพ ะพัะพะฑะตะฝะฝะพะณะพ.",
    # French
    "J'adore ce restaurant, c'est excellent !", "L'attente รฉtait trop longue et frustrante.", "Le film รฉtait moyen, sans plus.",
    # Turkish
    "Bu otelin manzarasฤฑna bayฤฑldฤฑm!", "รœrรผn tam bir hayal kฤฑrฤฑklฤฑฤŸฤฑydฤฑ.", "Konser fena deฤŸildi, ortalamaydฤฑ.",
    # Italian
    "Adoro questo posto, รจ fantastico!", "Il servizio clienti รจ stato pessimo.", "La cena era nella media.",
    # Polish
    "Uwielbiam tฤ™ restauracjฤ™, jedzenie jest ล›wietne!", "Obsล‚uga klienta byล‚a rozczarowujฤ…ca.", "Pogoda jest w porzฤ…dku, nic szczegรณlnego.",
    # Tagalog
    "Ang ganda ng lugar na ito, sobrang aliwalas!", "Hindi maganda ang serbisyo nila dito.", "Maayos lang ang palabas, walang espesyal.",
    # Dutch
    "Ik ben echt blij met mijn nieuwe aankoop!", "De klantenservice was echt slecht.", "De presentatie was gewoon okรฉ, niet bijzonder.",
    # Malay
    "Saya suka makanan di sini, sangat sedap!", "Pengalaman ini sangat mengecewakan.", "Hari ini cuacanya biasa sahaja.",
    # Korean
    "์ด ๊ฐ€๊ฒŒ์˜ ์ผ€์ดํฌ๋Š” ์ •๋ง ๋ง›์žˆ์–ด์š”!", "์„œ๋น„์Šค๊ฐ€ ๋„ˆ๋ฌด ๋ณ„๋กœ์˜€์–ด์š”.", "๋‚ ์”จ๊ฐ€ ๊ทธ์ € ๊ทธ๋ ‡๋„ค์š”.",
    # Swiss German
    "Ich find dรค Service i de Beiz mega guet!", "Dรคs Esรค het mir nรถd gfalle.", "D Wรคtter hรผt isch so naja."
]

for text, sentiment in zip(texts, predict_sentiment(texts)):
    print(f"Text: {text}\nSentiment: {sentiment}\n")

Ethical Considerations

Synthetic data reduces bias, but validation in real-world scenarios is advised.

Citation

Will be included.

Contact

For inquiries, data, private APIs, better models, contact [email protected]

tabularis.ai

Downloads last month
52,718
Safetensors
Model size
135M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tabularisai/multilingual-sentiment-analysis

Finetuned
(236)
this model
Finetunes
1 model

Spaces using tabularisai/multilingual-sentiment-analysis 5