DistilCamemBERT-NLI

We present DistilCamemBERT-NLI, which is DistilCamemBERT fine-tuned for the Natural Language Inference (NLI) task for the french language, also known as recognizing textual entailment (RTE). This model is constructed on the XNLI dataset, which determines whether a premise entails, contradicts or neither entails or contradicts a hypothesis.

This modelization is close to BaptisteDoyen/camembert-base-xnli based on CamemBERT model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase, for example. Indeed, inference cost can be a technological issue especially in the context of cross-encoding like this task. To counteract this effect, we propose this modelization which divides the inference time by 2 with the same consumption power, thanks to DistilCamemBERT.

Dataset

The dataset XNLI from FLUE comprises 392,702 premises with their hypothesis for the train and 5,010 couples for the test. The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of three labels). Sentence A is called premise, and sentence B is called hypothesis, then the goal of modelization is determined as follows: P(premise=c{contradiction,entailment,neutral}hypothesis)P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)

Evaluation results

class precision (%) f1-score (%) support
global 77.70 77.45 5,010
contradiction 78.00 79.54 1,670
entailment 82.90 78.87 1,670
neutral 72.18 74.04 1,670

Benchmark

We compare the DistilCamemBERT model to 2 other modelizations working on the french language. The first one BaptisteDoyen/camembert-base-xnli is based on well named CamemBERT, the french RoBERTa model and the second one MoritzLaurer/mDeBERTa-v3-base-mnli-xnli based on mDeBERTav3 a multilingual model. To compare the performances, the metrics of accuracy and MCC (Matthews Correlation Coefficient) were used. We used an AMD Ryzen 5 4500U @ 2.3GHz with 6 cores for mean inference time measure.

model time (ms) accuracy (%) MCC (x100)
cmarkea/distilcamembert-base-nli 51.35 77.45 66.24
BaptisteDoyen/camembert-base-xnli 105.0 81.72 72.67
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli 299.18 83.43 75.15

Zero-shot classification

The main advantage of such modelization is to create a zero-shot classifier allowing text classification without training. This task can be summarized by: P(hypothesis=iCpremise)=eP(premise=entailmenthypothesis=i)jCeP(premise=entailmenthypothesis=j)P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}

For this part, we use two datasets, the first one: allocine used to train the sentiment analysis models. The dataset comprises two classes: "positif" and "négatif" appreciation of movie reviews. Here we use "Ce commentaire est {}." as the hypothesis template and "positif" and "négatif" as candidate labels.

model time (ms) accuracy (%) MCC (x100)
cmarkea/distilcamembert-base-nli 195.54 80.59 63.71
BaptisteDoyen/camembert-base-xnli 378.39 86.37 73.74
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli 520.58 84.97 70.05

The second one: mlsum used to train the summarization models. In this aim, we aggregate sub-topics and select a few of them. We use the articles summary part to predict their topics. In this case, the hypothesis template used is "C'est un article traitant de {}." and the candidate labels are: "économie", "politique", "sport" and "science".

model time (ms) accuracy (%) MCC (x100)
cmarkea/distilcamembert-base-nli 217.77 79.30 70.55
BaptisteDoyen/camembert-base-xnli 448.27 70.7 64.10
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli 591.34 64.45 58.67

How to use DistilCamemBERT-NLI

from transformers import pipeline

classifier = pipeline(
    task='zero-shot-classification',
    model="cmarkea/distilcamembert-base-nli",
    tokenizer="cmarkea/distilcamembert-base-nli"
)
result = classifier (
    sequences="Le style très cinéphile de Quentin Tarantino "
    "se reconnaît entre autres par sa narration postmoderne "
    "et non linéaire, ses dialogues travaillés souvent "
    "émaillés de références à la culture populaire, et ses "
    "scènes hautement esthétiques mais d'une violence "
    "extrême, inspirées de films d'exploitation, d'arts "
    "martiaux ou de western spaghetti.",
    candidate_labels="cinéma, technologie, littérature, politique",
    hypothesis_template="Ce texte parle de {}."
)

result
{"labels": ["cinéma",
            "littérature",
            "technologie",
            "politique"],
 "scores": [0.7164115309715271,
            0.12878799438476562,
            0.1092301607131958,
            0.0455702543258667]}

Optimum + ONNX

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

HUB_MODEL = "cmarkea/distilcamembert-base-nli"

tokenizer = AutoTokenizer.from_pretrained(HUB_MODEL)
model = ORTModelForSequenceClassification.from_pretrained(HUB_MODEL)
onnx_qa = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)

# Quantized onnx model
quantized_model = ORTModelForSequenceClassification.from_pretrained(
    HUB_MODEL, file_name="model_quantized.onnx"
)

Citation

@inproceedings{delestre:hal-03674695,
  TITLE = {{DistilCamemBERT : une distillation du mod{\`e}le fran{\c c}ais CamemBERT}},
  AUTHOR = {Delestre, Cyrile and Amar, Abibatou},
  URL = {https://hal.archives-ouvertes.fr/hal-03674695},
  BOOKTITLE = {{CAp (Conf{\'e}rence sur l'Apprentissage automatique)}},
  ADDRESS = {Vannes, France},
  YEAR = {2022},
  MONTH = Jul,
  KEYWORDS = {NLP ; Transformers ; CamemBERT ; Distillation},
  PDF = {https://hal.archives-ouvertes.fr/hal-03674695/file/cap2022.pdf},
  HAL_ID = {hal-03674695},
  HAL_VERSION = {v1},
}
Downloads last month
27,258
Safetensors
Model size
68.1M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for cmarkea/distilcamembert-base-nli

Quantized
(4)
this model
Finetunes
1 model

Dataset used to train cmarkea/distilcamembert-base-nli

Collections including cmarkea/distilcamembert-base-nli