e5 cogcache small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("srikarvar/e5-small-cogcachedata-1")
# Run inference
sentences = [
    'How can I improve my English?',
    'How can I improve my Spanish?',
    'How can I gain weight?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.6846
cosine_accuracy_threshold 0.8909
cosine_f1 0.8038
cosine_f1_threshold 0.8909
cosine_precision 0.672
cosine_recall 1.0
cosine_ap 0.7428
dot_accuracy 0.6846
dot_accuracy_threshold 0.8909
dot_f1 0.8038
dot_f1_threshold 0.8909
dot_precision 0.672
dot_recall 1.0
dot_ap 0.7428
manhattan_accuracy 0.6846
manhattan_accuracy_threshold 6.8578
manhattan_f1 0.8038
manhattan_f1_threshold 7.2272
manhattan_precision 0.672
manhattan_recall 1.0
manhattan_ap 0.743
euclidean_accuracy 0.6846
euclidean_accuracy_threshold 0.4672
euclidean_f1 0.8038
euclidean_f1_threshold 0.4672
euclidean_precision 0.672
euclidean_recall 1.0
euclidean_ap 0.7428
max_accuracy 0.6846
max_accuracy_threshold 6.8578
max_f1 0.8038
max_f1_threshold 7.2272
max_precision 0.672
max_recall 1.0
max_ap 0.743

Binary Classification

Metric Value
cosine_accuracy 0.8923
cosine_accuracy_threshold 0.7951
cosine_f1 0.9231
cosine_f1_threshold 0.7486
cosine_precision 0.8571
cosine_recall 1.0
cosine_ap 0.9716
dot_accuracy 0.8923
dot_accuracy_threshold 0.7951
dot_f1 0.9231
dot_f1_threshold 0.7486
dot_precision 0.8571
dot_recall 1.0
dot_ap 0.9716
manhattan_accuracy 0.8846
manhattan_accuracy_threshold 10.6305
manhattan_f1 0.9171
manhattan_f1_threshold 10.6305
manhattan_precision 0.8557
manhattan_recall 0.9881
manhattan_ap 0.9702
euclidean_accuracy 0.8923
euclidean_accuracy_threshold 0.6402
euclidean_f1 0.9231
euclidean_f1_threshold 0.709
euclidean_precision 0.8571
euclidean_recall 1.0
euclidean_ap 0.9716
max_accuracy 0.8923
max_accuracy_threshold 10.6305
max_f1 0.9231
max_f1_threshold 10.6305
max_precision 0.8571
max_recall 1.0
max_ap 0.9716

Binary Classification

Metric Value
cosine_accuracy 0.8923
cosine_accuracy_threshold 0.7951
cosine_f1 0.9231
cosine_f1_threshold 0.7486
cosine_precision 0.8571
cosine_recall 1.0
cosine_ap 0.9716
dot_accuracy 0.8923
dot_accuracy_threshold 0.7951
dot_f1 0.9231
dot_f1_threshold 0.7486
dot_precision 0.8571
dot_recall 1.0
dot_ap 0.9716
manhattan_accuracy 0.8846
manhattan_accuracy_threshold 10.6305
manhattan_f1 0.9171
manhattan_f1_threshold 10.6305
manhattan_precision 0.8557
manhattan_recall 0.9881
manhattan_ap 0.9702
euclidean_accuracy 0.8923
euclidean_accuracy_threshold 0.6402
euclidean_f1 0.9231
euclidean_f1_threshold 0.709
euclidean_precision 0.8571
euclidean_recall 1.0
euclidean_ap 0.9716
max_accuracy 0.8923
max_accuracy_threshold 10.6305
max_f1 0.9231
max_f1_threshold 10.6305
max_precision 0.8571
max_recall 1.0
max_ap 0.9716

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,000 training samples
  • Columns: label, sentence1, and sentence2
  • Approximate statistics based on the first 1000 samples:
    label sentence1 sentence2
    type int string string
    details
    • 0: ~55.10%
    • 1: ~44.90%
    • min: 6 tokens
    • mean: 13.24 tokens
    • max: 66 tokens
    • min: 4 tokens
    • mean: 13.29 tokens
    • max: 55 tokens
  • Samples:
    label sentence1 sentence2
    1 What are the ingredients of a pizza? What are the ingredients of a pizza
    1 What are the ingredients of a pizza? What are the ingredients of pizza
    1 What are the ingredients of a pizza? What are ingredients of pizza
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 130 evaluation samples
  • Columns: label, sentence1, and sentence2
  • Approximate statistics based on the first 1000 samples:
    label sentence1 sentence2
    type int string string
    details
    • 0: ~35.38%
    • 1: ~64.62%
    • min: 6 tokens
    • mean: 10.85 tokens
    • max: 20 tokens
    • min: 5 tokens
    • mean: 11.48 tokens
    • max: 22 tokens
  • Samples:
    label sentence1 sentence2
    1 What are the ingredients of a pizza? What are the ingredients of a pizza
    1 What are the ingredients of a pizza? What are the ingredients of pizza
    1 What are the ingredients of a pizza? What are ingredients of pizza
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 6
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss e5-cogcache-dev_max_ap quora-duplicates-dev_max_ap
0 0 - - - 0.7430
1.0 125 - 0.4486 - 0.8547
2.0 250 - 0.2319 - 0.9373
3.0 375 - 0.1411 - 0.9634
4.0 500 0.2324 0.1785 - 0.9687
5.0 625 - 0.1681 - 0.9713
6.0 750 - 0.1477 0.9716 0.9716

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
17
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for srikarvar/e5-small-cogcachedata-1

Finetuned
(59)
this model

Evaluation results