MPNet base trained on AllNLI triplets

This is a sentence-transformers model finetuned from TaylorAI/bge-micro. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: TaylorAI/bge-micro
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("fpc/bge-micro-smiles")
# Run inference
sentences = [
    '(±)-cis-2-(4-methoxyphenyl)-3-acetoxy-5-[2-(dimethylamino)ethyl]-8-chloro-2,3-dihydro-1,5-benzothiazepin-4(5H)-one hydrochloride',
    'Cl.COC1=CC=C(C=C1)[C@@H]1SC2=C(N(C([C@@H]1OC(C)=O)=O)CCN(C)C)C=CC(=C2)Cl',
    'O[C@@H]1[C@H](O)[C@@H](Oc2nc(N3CCNCC3)nc3ccccc23)C[C@H]1O',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,210,255 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 5 tokens
    • mean: 42.57 tokens
    • max: 153 tokens
    • min: 4 tokens
    • mean: 40.02 tokens
    • max: 325 tokens
  • Samples:
    anchor positive
    4-t-butylbromobenzene C(C)(C)(C)C1=CC=C(C=C1)Br
    1-methyl-4-(morpholine-4-carbonyl)-N-(2-phenyl-[1,2,4]triazolo[1,5-a]pyridin-7-yl)-1H-pyrazole-5-carboxamide CN1N=CC(=C1C(=O)NC1=CC=2N(C=C1)N=C(N2)C2=CC=CC=C2)C(=O)N2CCOCC2
    Phthalimide C1(C=2C(C(N1)=O)=CC=CC2)=O
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 512
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 512
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss bge-micro-test_spearman_cosine
0.0159 100 6.1861 -
0.0319 200 6.0547 -
0.0478 300 5.6041 -
0.0638 400 4.9367 -
0.0797 500 4.3412 -
0.0957 600 3.8245 -
0.1116 700 3.3188 -
0.1276 800 2.869 -
0.1435 900 2.5149 -
0.1595 1000 2.2282 -
0.1754 1100 2.0046 -
0.1914 1200 1.8032 -
0.2073 1300 1.6289 -
0.2232 1400 1.4567 -
0.2392 1500 1.3326 -
0.2551 1600 1.2127 -
0.2711 1700 1.0909 -
0.2870 1800 1.0021 -
0.3030 1900 0.9135 -
0.3189 2000 0.8378 -
0.3349 2100 0.7758 -
0.3508 2200 0.7031 -
0.3668 2300 0.6418 -
0.3827 2400 0.5965 -
0.3987 2500 0.5461 -
0.4146 2600 0.5039 -
0.4306 2700 0.4674 -
0.4465 2800 0.4339 -
0.4624 2900 0.4045 -
0.4784 3000 0.373 -
0.4943 3100 0.3566 -
0.5103 3200 0.3348 -
0.5262 3300 0.3215 -
0.5422 3400 0.302 -
0.5581 3500 0.2826 -
0.5741 3600 0.2803 -
0.5900 3700 0.2616 -
0.6060 3800 0.2554 -
0.6219 3900 0.234 -
0.6379 4000 0.2306 -
0.6538 4100 0.2224 -
0.6697 4200 0.2141 -
0.6857 4300 0.2117 -
0.7016 4400 0.204 -
0.7176 4500 0.198 -
0.7335 4600 0.1986 -
0.7495 4700 0.1821 -
0.7654 4800 0.1813 -
0.7814 4900 0.1741 -
0.7973 5000 0.1697 -
0.8133 5100 0.1655 -
0.8292 5200 0.1623 -
0.8452 5300 0.1593 -
0.8611 5400 0.1566 -
0.8771 5500 0.151 -
0.8930 5600 0.1526 -
0.9089 5700 0.1453 -
0.9249 5800 0.1448 -
0.9408 5900 0.1369 -
0.9568 6000 0.1409 -
0.9727 6100 0.1373 -
0.9887 6200 0.133 -
1.0046 6300 0.1269 -
1.0206 6400 0.1274 -
1.0365 6500 0.1271 -
1.0525 6600 0.1216 -
1.0684 6700 0.1176 -
1.0844 6800 0.1208 -
1.1003 6900 0.1177 -
1.1162 7000 0.1175 -
1.1322 7100 0.1109 -
1.1481 7200 0.1118 -
1.1641 7300 0.1085 -
1.1800 7400 0.1155 -
1.1960 7500 0.1079 -
1.2119 7600 0.1087 -
1.2279 7700 0.1004 -
1.2438 7800 0.1084 -
1.2598 7900 0.1089 -
1.2757 8000 0.1012 -
1.2917 8100 0.1037 -
1.3076 8200 0.1004 -
1.3236 8300 0.0979 -
1.3395 8400 0.1007 -
1.3554 8500 0.0956 -
1.3714 8600 0.0972 -
1.3873 8700 0.0947 -
1.4033 8800 0.0931 -
1.4192 8900 0.0948 -
1.4352 9000 0.0925 -
1.4511 9100 0.0933 -
1.4671 9200 0.0888 -
1.4830 9300 0.0877 -
1.4990 9400 0.0889 -
1.5149 9500 0.0895 -
1.5309 9600 0.0892 -
1.5468 9700 0.089 -
1.5627 9800 0.0828 -
1.5787 9900 0.0906 -
1.5946 10000 0.0893 -
1.6106 10100 0.0849 -
1.6265 10200 0.0811 -
1.6425 10300 0.0823 -
1.6584 10400 0.0806 -
1.6744 10500 0.0815 -
1.6903 10600 0.0832 -
1.7063 10700 0.0856 -
1.7222 10800 0.081 -
1.7382 10900 0.0831 -
1.7541 11000 0.0767 -
1.7701 11100 0.0779 -
1.7860 11200 0.0792 -
1.8019 11300 0.0771 -
1.8179 11400 0.0783 -
1.8338 11500 0.0749 -
1.8498 11600 0.0755 -
1.8657 11700 0.0778 -
1.8817 11800 0.0753 -
1.8976 11900 0.0767 -
1.9136 12000 0.0725 -
1.9295 12100 0.0744 -
1.9455 12200 0.0743 -
1.9614 12300 0.0722 -
1.9774 12400 0.0712 -
1.9933 12500 0.0709 -
2.0092 12600 0.0694 -
2.0252 12700 0.0705 -
2.0411 12800 0.0715 -
2.0571 12900 0.0705 -
2.0730 13000 0.0653 -
2.0890 13100 0.0698 -
2.1049 13200 0.0676 -
2.1209 13300 0.0684 -
2.1368 13400 0.0644 -
2.1528 13500 0.0652 -
2.1687 13600 0.0673 -
2.1847 13700 0.067 -
2.2006 13800 0.0645 -
2.2166 13900 0.0633 -
2.2325 14000 0.0645 -
2.2484 14100 0.0698 -
2.2644 14200 0.0655 -
2.2803 14300 0.0654 -
2.2963 14400 0.0656 -
2.3122 14500 0.0631 -
2.3282 14600 0.0628 -
2.3441 14700 0.0671 -
2.3601 14800 0.0659 -
2.3760 14900 0.0619 -
2.3920 15000 0.0618 -
2.4079 15100 0.0624 -
2.4239 15200 0.0616 -
2.4398 15300 0.0631 -
2.4557 15400 0.0639 -
2.4717 15500 0.0585 -
2.4876 15600 0.0607 -
2.5036 15700 0.0615 -
2.5195 15800 0.062 -
2.5355 15900 0.0621 -
2.5514 16000 0.0608 -
2.5674 16100 0.0594 -
2.5833 16200 0.0631 -
2.5993 16300 0.0635 -
2.6152 16400 0.06 -
2.6312 16500 0.0581 -
2.6471 16600 0.0607 -
2.6631 16700 0.0577 -
2.6790 16800 0.0592 -
2.6949 16900 0.0625 -
2.7109 17000 0.0622 -
2.7268 17100 0.0573 -
2.7428 17200 0.0613 -
2.7587 17300 0.0587 -
2.7747 17400 0.0587 -
2.7906 17500 0.0588 -
2.8066 17600 0.0568 -
2.8225 17700 0.0573 -
2.8385 17800 0.0575 -
2.8544 17900 0.0575 -
2.8704 18000 0.0582 -
2.8863 18100 0.0577 -
2.9022 18200 0.057 -
2.9182 18300 0.0572 -
2.9341 18400 0.0558 -
2.9501 18500 0.0578 -
2.9660 18600 0.0567 -
2.9820 18700 0.0569 -
2.9979 18800 0.0547 -
3.0139 18900 0.0542 -
3.0298 19000 0.0563 -
3.0458 19100 0.0549 -
3.0617 19200 0.0531 -
3.0777 19300 0.053 -
3.0936 19400 0.0557 -
3.1096 19500 0.0546 -
3.1255 19600 0.0518 -
3.1414 19700 0.0517 -
3.1574 19800 0.0528 -
3.1733 19900 0.0551 -
3.1893 20000 0.0544 -
3.2052 20100 0.0526 -
3.2212 20200 0.0494 -
3.2371 20300 0.0537 -
3.2531 20400 0.0568 -
3.2690 20500 0.0525 -
3.2850 20600 0.0566 -
3.3009 20700 0.0539 -
3.3169 20800 0.0531 -
3.3328 20900 0.0524 -
3.3487 21000 0.0543 -
3.3647 21100 0.0537 -
3.3806 21200 0.0524 -
3.3966 21300 0.0516 -
3.4125 21400 0.0537 -
3.4285 21500 0.0515 -
3.4444 21600 0.0537 -
3.4604 21700 0.0526 -
3.4763 21800 0.0508 -
3.4923 21900 0.0526 -
3.5082 22000 0.0521 -
3.5242 22100 0.054 -
3.5401 22200 0.053 -
3.5561 22300 0.0509 -
3.5720 22400 0.0526 -
3.5879 22500 0.0551 -
3.6039 22600 0.0556 -
3.6198 22700 0.0497 -
3.6358 22800 0.0515 -
3.6517 22900 0.0514 -
3.6677 23000 0.0503 -
3.6836 23100 0.0515 -
3.6996 23200 0.0553 -
3.7155 23300 0.0519 -
3.7315 23400 0.0549 -
3.7474 23500 0.0522 -
3.7634 23600 0.0526 -
3.7793 23700 0.0525 -
3.7952 23800 0.051 -
3.8112 23900 0.0509 -
3.8271 24000 0.0503 -
3.8431 24100 0.0524 -
3.8590 24200 0.0526 -
3.8750 24300 0.0512 -
3.8909 24400 0.0518 -
3.9069 24500 0.0521 -
3.9228 24600 0.0524 -
3.9388 24700 0.051 -
3.9547 24800 0.0535 -
3.9707 24900 0.0508 -
3.9866 25000 0.0514 -
4.0 25084 - nan

Framework Versions

  • Python: 3.10.9
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.4.1+cu124
  • Accelerate: 0.33.0
  • Datasets: 2.18.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, 
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
7
Safetensors
Model size
17.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for fpc/bge-micro-smiles

Base model

TaylorAI/bge-micro
Quantized
(1)
this model

Evaluation results