๐ distilbert-based Multilingual Sentiment Classification Model
NEWS!
- 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.
Model Details
Model Name:
tabularisai/multilingual-sentiment-analysisBase Model:
distilbert/distilbert-base-multilingual-casedTask:
Text Classification (Sentiment Analysis)Languages:
Supports English plus Chinese (ไธญๆ), Spanish (Espaรฑol), Hindi (เคนเคฟเคจเฅเคฆเฅ), Arabic (ุงูุนุฑุจูุฉ), Bengali (เฆฌเฆพเฆเฆฒเฆพ), Portuguese (Portuguรชs), Russian (ะ ัััะบะธะน), Japanese (ๆฅๆฌ่ช), German (Deutsch), Malay (Bahasa Melayu), Telugu (เฐคเฑเฐฒเฑเฐเฑ), Vietnamese (Tiแบฟng Viแปt), Korean (ํ๊ตญ์ด), French (Franรงais), Turkish (Tรผrkรงe), Italian (Italiano), Polish (Polski), Ukrainian (ะฃะบัะฐัะฝััะบะฐ), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).Number of Classes:
5 (Very Negative, Negative, Neutral, Positive, Very Positive)Usage:
- Social media analysis
- Customer feedback analysis
- Product reviews classification
- Brand monitoring
- Market research
- Customer service optimization
- Competitive intelligence
Model Description
This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased
for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.
Training Data
Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.
Training Procedure
- Fine-tuned for 3.5 epochs.
- Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.
Intended Use
Ideal for:
- Multilingual social media monitoring
- International customer feedback analysis
- Global product review sentiment classification
- Worldwide brand sentiment tracking
How to Use
Using pipelines, it takes only 4 lines:
from transformers import pipeline
# Load the classification pipeline with the specified model
pipe = pipeline("text-classification", model="tabularisai/multilingual-sentiment-analysis")
# Classify a new sentence
sentence = "I love this product! It's amazing and works perfectly."
result = pipe(sentence)
# Print the result
print(result)
Below is a Python example on how to use the multilingual sentiment model without pipelines:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def predict_sentiment(texts):
inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
return [sentiment_map[p] for p in torch.argmax(probabilities, dim=-1).tolist()]
texts = [
# English
"I absolutely love the new design of this app!", "The customer service was disappointing.", "The weather is fine, nothing special.",
# Chinese
"่ฟๅฎถ้คๅ
็่ๅณ้้ๅธธๆฃ๏ผ", "ๆๅฏนไป็ๅ็ญๅพๅคฑๆใ", "ๅคฉๆฐไปๅคฉไธ่ฌใ",
# Spanish
"ยกMe encanta cรณmo quedรณ la decoraciรณn!", "El servicio fue terrible y muy lento.", "El libro estuvo mรกs o menos.",
# Arabic
"ุงูุฎุฏู
ุฉ ูู ูุฐุง ุงูููุฏู ุฑุงุฆุนุฉ ุฌุฏูุง!", "ูู
ูุนุฌุจูู ุงูุทุนุงู
ูู ูุฐุง ุงูู
ุทุนู
.", "ูุงูุช ุงูุฑุญูุฉ ุนุงุฏูุฉใ",
# Ukrainian
"ะะตะฝั ะดัะถะต ัะฟะพะดะพะฑะฐะปะฐัั ัั ะฒะธััะฐะฒะฐ!", "ะะฑัะปัะณะพะฒัะฒะฐะฝะฝั ะฑัะปะพ ะถะฐั
ะปะธะฒะธะผ.", "ะะฝะธะณะฐ ะฑัะปะฐ ะฟะพัะตัะตะดะฝัะพัใ",
# Hindi
"เคฏเคน เคเคเคน เคธเค เคฎเฅเค เค
เคฆเฅเคญเฅเคค เคนเฅ!", "เคฏเคน เค
เคจเฅเคญเคต เคฌเคนเฅเคค เคเคฐเคพเคฌ เคฅเคพเฅค", "เคซเคฟเคฒเฅเคฎ เค เฅเค-เค เคพเค เคฅเฅเฅค",
# Bengali
"เฆเฆเฆพเฆจเฆเฆพเฆฐ เฆชเฆฐเฆฟเฆฌเงเฆถ เฆ
เฆธเฆพเฆงเฆพเฆฐเฆฃ!", "เฆธเงเฆฌเฆพเฆฐ เฆฎเฆพเฆจ เฆเฆเงเฆฌเฆพเฆฐเงเฆ เฆเฆพเฆฐเฆพเฆชเฅค", "เฆเฆพเฆฌเฆพเฆฐเฆเฆพ เฆฎเงเฆเฆพเฆฎเงเฆเฆฟ เฆเฆฟเฆฒเฅค",
# Portuguese
"Este livro รฉ fantรกstico! Eu aprendi muitas coisas novas e inspiradoras.",
"Nรฃo gostei do produto, veio quebrado.", "O filme foi ok, nada de especial.",
# Japanese
"ใใฎใฌในใใฉใณใฎๆ็ใฏๆฌๅฝใซ็พๅณใใใงใ๏ผ", "ใใฎใใใซใฎใตใผใในใฏใใฃใใใใพใใใ", "ๅคฉๆฐใฏใพใใพใใงใใ",
# Russian
"ะฏ ะฒ ะฒะพััะพัะณะต ะพั ััะพะณะพ ะฝะพะฒะพะณะพ ะณะฐะดะถะตัะฐ!", "ะญัะพั ัะตัะฒะธั ะพััะฐะฒะธะป ั ะผะตะฝั ัะพะปัะบะพ ัะฐะทะพัะฐัะพะฒะฐะฝะธะต.", "ะัััะตัะฐ ะฑัะปะฐ ะพะฑััะฝะพะน, ะฝะธัะตะณะพ ะพัะพะฑะตะฝะฝะพะณะพ.",
# French
"J'adore ce restaurant, c'est excellent !", "L'attente รฉtait trop longue et frustrante.", "Le film รฉtait moyen, sans plus.",
# Turkish
"Bu otelin manzarasฤฑna bayฤฑldฤฑm!", "รrรผn tam bir hayal kฤฑrฤฑklฤฑฤฤฑydฤฑ.", "Konser fena deฤildi, ortalamaydฤฑ.",
# Italian
"Adoro questo posto, รจ fantastico!", "Il servizio clienti รจ stato pessimo.", "La cena era nella media.",
# Polish
"Uwielbiam tฤ restauracjฤ, jedzenie jest ลwietne!", "Obsลuga klienta byลa rozczarowujฤ
ca.", "Pogoda jest w porzฤ
dku, nic szczegรณlnego.",
# Tagalog
"Ang ganda ng lugar na ito, sobrang aliwalas!", "Hindi maganda ang serbisyo nila dito.", "Maayos lang ang palabas, walang espesyal.",
# Dutch
"Ik ben echt blij met mijn nieuwe aankoop!", "De klantenservice was echt slecht.", "De presentatie was gewoon okรฉ, niet bijzonder.",
# Malay
"Saya suka makanan di sini, sangat sedap!", "Pengalaman ini sangat mengecewakan.", "Hari ini cuacanya biasa sahaja.",
# Korean
"์ด ๊ฐ๊ฒ์ ์ผ์ดํฌ๋ ์ ๋ง ๋ง์์ด์!", "์๋น์ค๊ฐ ๋๋ฌด ๋ณ๋ก์์ด์.", "๋ ์จ๊ฐ ๊ทธ์ ๊ทธ๋ ๋ค์.",
# Swiss German
"Ich find dรค Service i de Beiz mega guet!", "Dรคs Esรค het mir nรถd gfalle.", "D Wรคtter hรผt isch so naja."
]
for text, sentiment in zip(texts, predict_sentiment(texts)):
print(f"Text: {text}\nSentiment: {sentiment}\n")
Ethical Considerations
Synthetic data reduces bias, but validation in real-world scenarios is advised.
Citation
Will be included.
Contact
For inquiries, data, private APIs, better models, contact [email protected]
tabularis.ai
- Downloads last month
- 52,718
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.