SentenceTransformer based on sentence-transformers/LaBSE

This is a sentence-transformers model finetuned from sentence-transformers/LaBSE. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/LaBSE
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aminlouhichi/CDGSmilarity")
# Run inference
sentences = [
    'Temps partiel surcotisé',
    'Temps partiel surcotisé de droit',
    'Départ définitif - Radiation des cadres',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 295 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string float
    details
    • min: 4 tokens
    • mean: 9.31 tokens
    • max: 20 tokens
    • min: 4 tokens
    • mean: 10.41 tokens
    • max: 20 tokens
    • min: 0.9
    • mean: 0.95
    • max: 1.0
  • Samples:
    premise hypothesis label
    Compte rendu d'entretien professionnel Synthèse des discussions professionnelles 0.9820208462484844
    Congé Accident de trajet Arrêt de travail pour accident de trajet 0.9755981363214147
    Retrait ou suppression du CTI (complément de traitement indiciaire) Retrait du Complément de Traitement Indiciaire (CTI) 0.9524167934189104
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 74 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string float
    details
    • min: 4 tokens
    • mean: 10.26 tokens
    • max: 25 tokens
    • min: 5 tokens
    • mean: 10.5 tokens
    • max: 20 tokens
    • min: 0.9
    • mean: 0.95
    • max: 1.0
  • Samples:
    premise hypothesis label
    Sanction disciplinaire Mesure punitive suite à une violation du règlement 0.958828679924412
    Départ définitif / Radiation - Décès Départ définitif suite au décès d'un agent 0.9003635138326387
    Nomination par intégration directe Intégration immédiate avec nomination 0.9993378836623817
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 30
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 30
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.5263 10 12.4933 -
1.0526 20 10.5909 -
1.5789 30 7.0607 -
2.1053 40 4.7061 -
2.6316 50 4.7957 -
3.1579 60 4.624 -
3.6842 70 4.7854 -
4.2105 80 4.5902 -
4.7368 90 4.7051 -
5.2632 100 4.5562 4.6756
5.7895 110 4.6376 -
6.3158 120 4.4501 -
6.8421 130 4.5993 -
7.3684 140 4.4878 -
7.8947 150 4.5443 -
8.4211 160 4.3091 -
8.9474 170 4.6699 -
9.4737 180 4.3727 -
10.0 190 4.3888 -
10.5263 200 4.5099 5.3597
11.0526 210 4.3427 -
11.5789 220 4.4409 -
12.1053 230 4.3151 -
12.6316 240 4.3522 -
13.1579 250 4.3133 -
13.6842 260 4.3842 -
14.2105 270 4.2708 -
14.7368 280 4.387 -
15.2632 290 4.1131 -
15.7895 300 4.3394 5.5109
16.3158 310 4.2948 -
16.8421 320 4.3413 -
17.3684 330 4.1427 -
17.8947 340 4.5521 -
18.4211 350 4.2146 -
18.9474 360 4.2039 -
19.4737 370 4.1412 -
20.0 380 4.0869 -
20.5263 390 4.4763 -
21.0526 400 3.9572 5.7054
21.5789 410 4.2114 -
22.1053 420 4.2651 -
22.6316 430 4.2231 -
23.1579 440 4.0521 -
23.6842 450 4.3246 -
24.2105 460 3.9145 -
24.7368 470 4.1701 -
25.2632 480 4.0958 -
25.7895 490 4.1177 -
26.3158 500 4.2388 6.3162
26.8421 510 4.3043 -
27.3684 520 3.9634 -
27.8947 530 4.117 -
28.4211 540 4.1732 -
28.9474 550 4.1243 -
29.4737 560 3.7898 -
30.0 570 4.0227 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.1
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
20
Safetensors
Model size
471M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for aminlouhichi/CDGSmilarity

Finetuned
(28)
this model