snowflake-arctic-embed-m-klej-dyk

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-m
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Chłopiec z Nariokotome',
    'ile wynosiła objętość mózgu chłopca z Nariokotome?',
    'gdzie znajduje się czwarty polski cmentarz katyński?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1851
cosine_accuracy@3 0.4808
cosine_accuracy@5 0.625
cosine_accuracy@10 0.726
cosine_precision@1 0.1851
cosine_precision@3 0.1603
cosine_precision@5 0.125
cosine_precision@10 0.0726
cosine_recall@1 0.1851
cosine_recall@3 0.4808
cosine_recall@5 0.625
cosine_recall@10 0.726
cosine_ndcg@10 0.4479
cosine_mrr@10 0.359
cosine_map@100 0.3672

Information Retrieval

Metric Value
cosine_accuracy@1 0.1755
cosine_accuracy@3 0.4712
cosine_accuracy@5 0.613
cosine_accuracy@10 0.7019
cosine_precision@1 0.1755
cosine_precision@3 0.1571
cosine_precision@5 0.1226
cosine_precision@10 0.0702
cosine_recall@1 0.1755
cosine_recall@3 0.4712
cosine_recall@5 0.613
cosine_recall@10 0.7019
cosine_ndcg@10 0.4334
cosine_mrr@10 0.3474
cosine_map@100 0.3564

Information Retrieval

Metric Value
cosine_accuracy@1 0.1562
cosine_accuracy@3 0.4543
cosine_accuracy@5 0.5649
cosine_accuracy@10 0.6731
cosine_precision@1 0.1562
cosine_precision@3 0.1514
cosine_precision@5 0.113
cosine_precision@10 0.0673
cosine_recall@1 0.1562
cosine_recall@3 0.4543
cosine_recall@5 0.5649
cosine_recall@10 0.6731
cosine_ndcg@10 0.4103
cosine_mrr@10 0.3261
cosine_map@100 0.3351

Information Retrieval

Metric Value
cosine_accuracy@1 0.1635
cosine_accuracy@3 0.3918
cosine_accuracy@5 0.5072
cosine_accuracy@10 0.6058
cosine_precision@1 0.1635
cosine_precision@3 0.1306
cosine_precision@5 0.1014
cosine_precision@10 0.0606
cosine_recall@1 0.1635
cosine_recall@3 0.3918
cosine_recall@5 0.5072
cosine_recall@10 0.6058
cosine_ndcg@10 0.3758
cosine_mrr@10 0.3027
cosine_map@100 0.3117

Information Retrieval

Metric Value
cosine_accuracy@1 0.149
cosine_accuracy@3 0.3389
cosine_accuracy@5 0.4183
cosine_accuracy@10 0.4928
cosine_precision@1 0.149
cosine_precision@3 0.113
cosine_precision@5 0.0837
cosine_precision@10 0.0493
cosine_recall@1 0.149
cosine_recall@3 0.3389
cosine_recall@5 0.4183
cosine_recall@10 0.4928
cosine_ndcg@10 0.3178
cosine_mrr@10 0.2621
cosine_map@100 0.2704

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,738 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 94.61 tokens
    • max: 512 tokens
    • min: 10 tokens
    • mean: 30.71 tokens
    • max: 76 tokens
  • Samples:
    positive anchor
    Marsz Ochotników (chin. kto jest kompozytorem chińskiego hymnu narodowego Marsz Ochotników?
    Wybrane przykłady: Święta Rodzina – Maryja z Dzieciątkiem na ręku, niekiedy obok niej stoi św. Józef Rodzina Marii – przedstawienie w którym pojawia się Święta Rodzina oraz postaci spokrewnione z Marią. Maria w połogu (Maria in puerperio) – leżąca na łożu Maria opiekuje się Dzieciątkiem Maria karmiąca (Maria lactans) – Maria karmiąca swą piersią Dzieciątko Orantka – kobieta modląca się z podniesionymi rękami (częsty motyw ikon wschodnich); Sacra Conversazione – Matka Boska tronująca z Dzieciątkiem, otoczona stojącymi postaciami świętych Pietà – opłakująca Jezusa, trzymając na kolanach jego ciało po śmierci na krzyżu; Hodegetria – ujęcie popiersia Maryi, trzymającej na rękach małego Jezusa, częsty motyw w ikonach Eleusa – formalnie podobne do przedstawienia Hodegetrii lecz Maryja policzkiem przytula się do policzka Jezusa Immaculata – Niepokalane Poczęcie Najświętszej Maryi Panny. kto zamiast Maryi trzyma nowonarodzonego Jezusa w scenie Bożego Narodzenia przedstawionej na poliptyku z Marią i Dzieciątkiem Jezus?
    Pomnik Josepha von Eichendorffa w Brzeziu Pomnik Josepha von Eichendorffa – odtworzony w 2006 roku pomnik znanego niemieckiego poety epoki romantyzmu związanego z ziemią raciborską, Josepha von Eichendorffa. po ilu latach odtworzono wysadzony w 1945 roku pomnik Josepha von Eichendorffa w Raciborzu-Brzeziu?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.0684 1 9.3155 - - - - -
0.1368 2 9.1788 - - - - -
0.2051 3 8.8387 - - - - -
0.2735 4 8.2961 - - - - -
0.3419 5 8.0242 - - - - -
0.4103 6 7.2329 - - - - -
0.4786 7 5.4386 - - - - -
0.5470 8 6.1186 - - - - -
0.6154 9 4.9714 - - - - -
0.6838 10 5.1958 - - - - -
0.7521 11 5.1135 - - - - -
0.8205 12 4.6971 - - - - -
0.8889 13 4.5559 - - - - -
0.9573 14 3.9357 0.2842 0.3098 0.3191 0.2238 0.3209
1.0256 15 3.7916 - - - - -
1.0940 16 3.6393 - - - - -
1.1624 17 3.7733 - - - - -
1.2308 18 3.6974 - - - - -
1.2991 19 3.5964 - - - - -
1.3675 20 3.4118 - - - - -
1.4359 21 3.2022 - - - - -
1.5043 22 2.8133 - - - - -
1.5726 23 3.0871 - - - - -
1.6410 24 2.9559 - - - - -
1.7094 25 2.8192 - - - - -
1.7778 26 3.462 - - - - -
1.8462 27 3.1435 - - - - -
1.9145 28 2.8001 - - - - -
1.9829 29 2.5643 0.3134 0.3359 0.3563 0.2588 0.3671
2.0513 30 2.4295 - - - - -
2.1197 31 2.3892 - - - - -
2.1880 32 2.5228 - - - - -
2.2564 33 2.4906 - - - - -
2.3248 34 2.5358 - - - - -
2.3932 35 2.2806 - - - - -
2.4615 36 2.0083 - - - - -
2.5299 37 2.5088 - - - - -
2.5983 38 2.0628 - - - - -
2.6667 39 2.193 - - - - -
2.7350 40 2.4783 - - - - -
2.8034 41 2.382 - - - - -
2.8718 42 2.2017 - - - - -
2.9402 43 1.9739 0.3111 0.3392 0.3572 0.2657 0.3659
3.0085 44 2.0332 - - - - -
3.0769 45 1.9983 - - - - -
3.1453 46 1.8612 - - - - -
3.2137 47 1.9897 - - - - -
3.2821 48 2.2514 - - - - -
3.3504 49 2.0092 - - - - -
3.4188 50 1.7399 - - - - -
3.4872 51 1.5825 - - - - -
3.5556 52 2.1501 - - - - -
3.6239 53 1.4505 - - - - -
3.6923 54 1.8575 - - - - -
3.7607 55 2.3882 - - - - -
3.8291 56 2.1119 - - - - -
3.8974 57 1.8992 - - - - -
3.9658 58 1.8323 0.3117 0.3365 0.3558 0.2683 0.3670
4.0342 59 1.5938 - - - - -
4.1026 60 1.552 - - - - -
4.1709 61 1.907 - - - - -
4.2393 62 1.8304 - - - - -
4.3077 63 1.8775 - - - - -
4.3761 64 1.8654 - - - - -
4.4444 65 1.7944 - - - - -
4.5128 66 1.8335 - - - - -
4.5812 67 1.8823 - - - - -
4.6496 68 1.6479 - - - - -
4.7179 69 1.5771 - - - - -
4.7863 70 2.1911 0.3117 0.3351 0.3564 0.2704 0.3672
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.3.1
  • Accelerate: 0.27.2
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
15
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ve88ifz2/snowflake-arctic-embed-m-klej-dyk-v0.1

Finetuned
(29)
this model

Evaluation results