XLM_RoBERTa-Multilingual-Clickbait-Detection

This model is a fine-tuned version of xlm-roberta-large on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2192
  • Micro F1: 0.9759
  • Macro F1: 0.9758
  • Accuracy: 0.9759

Test Set Macro-F1 scores

  • Multilingual test set: 97.28
  • en test set: 97.83
  • el test set: 97.32
  • it test set: 97.54
  • es test set: 97.67
  • ro test set: 97.40
  • de test set: 97.40
  • fr test set: 96.90
  • pl test set: 96.18

Intended uses & limitations

  • This model will be employed for an EU project.

Training and evaluation data

  • The "clickbait_detection_dataset" was translated from English to Greek, Italian, Spanish, Romanian, French and German using the Opus-mt.
  • The dataset was also translated from English to Polish using the M2M NMT.
  • The "EasyNMT" library was utilized to employ the NMT models.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 4

Framework versions

  • Transformers 4.36.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.13.1
  • Tokenizers 0.15.0
Downloads last month
60
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for christinacdl/XLM_RoBERTa-Multilingual-Clickbait-Detection

Finetuned
(331)
this model
Quantizations
1 model

Dataset used to train christinacdl/XLM_RoBERTa-Multilingual-Clickbait-Detection