SentenceTransformer based on dunzhang/stella_en_400M_v5

This is a sentence-transformers model finetuned from dunzhang/stella_en_400M_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: dunzhang/stella_en_400M_v5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Title: \nText: what was the average for "other" loans held in 2012 and 2011?',
    'Title: \nText: LOANS HELD FOR SALE Table 15: Loans Held For Sale\n| In millions | December 312012 | December 312011 |\n| Commercial mortgages at fair value | $772 | $843 |\n| Commercial mortgages at lower of cost or market | 620 | 451 |\n| Total commercial mortgages | 1,392 | 1,294 |\n| Residential mortgages at fair value | 2,096 | 1,415 |\n| Residential mortgages at lower of cost or market | 124 | 107 |\n| Total residential mortgages | 2,220 | 1,522 |\n| Other | 81 | 120 |\n| Total | $3,693 | $2,936 |\nWe stopped originating commercial mortgage loans held for sale designated at fair value in 2008 and continue pursuing opportunities to reduce these positions at appropriate prices.\nAt December 31, 2012, the balance relating to these loans was $772 million, compared to $843 million at December 31, 2011.\nWe sold $32 million in unpaid principal balances of these commercial mortgage loans held for sale carried at fair value in 2012 and sold $25 million in 2011.',
    'Title: \nText: Investments and Derivative Instruments (continued) Security Unrealized Loss Aging The following tables present the Company’s unrealized loss aging for AFS securities by type and length of time the security was in a continuous unrealized loss position.\n|  | December 31, 2011 |\n|  | Less Than 12 Months | 12 Months or More | Total |\n|  | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized |\n|  | Cost | Value | Losses | Cost | Value | Losses | Cost | Value | Losses |\n| ABS | $629 | $594 | $-35 | $1,169 | $872 | $-297 | $1,798 | $1,466 | $-332 |\n| CDOs | 81 | 59 | -22 | 2,709 | 2,383 | -326 | 2,790 | 2,442 | -348 |\n| CMBS | 1,297 | 1,194 | -103 | 2,144 | 1,735 | -409 | 3,441 | 2,929 | -512 |\n| Corporate [1] | 4,388 | 4,219 | -169 | 3,268 | 2,627 | -570 | 7,656 | 6,846 | -739 |\n| Foreign govt./govt. agencies | 218 | 212 | -6 | 51 | 47 | -4 | 269 | 259 | -10 |\n| Municipal | 299 | 294 | -5 | 627 | 560 | -67 | 926 | 854 | -72 |\n| RMBS | 415 | 330 | -85 | 1,206 | 835 | -371 | 1,621 | 1,165 | -456 |\n| U.S. Treasuries | 343 | 341 | -2 | — | — | — | 343 | 341 | -2 |\n| Total fixed maturities | 7,670 | 7,243 | -427 | 11,174 | 9,059 | -2,044 | 18,844 | 16,302 | -2,471 |\n| Equity securities | 167 | 138 | -29 | 439 | 265 | -174 | 606 | 403 | -203 |\n| Total securities in an unrealized loss | $7,837 | $7,381 | $-456 | $11,613 | $9,324 | $-2,218 | $19,450 | $16,705 | $-2,674 |\nDecember 31, 2010\n|  | December 31, 2010 |\n|  | Less Than 12 Months | 12 Months or More | Total |\n|  | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized | Amortized | Fair | Unrealized |\n|  | Cost | Value | Losses | Cost | Value | Losses | Cost | Value | Losses |\n| ABS | $302 | $290 | $-12 | $1,410 | $1,026 | $-384 | $1,712 | $1,316 | $-396 |\n| CDOs | 321 | 293 | -28 | 2,724 | 2,274 | -450 | 3,045 | 2,567 | -478 |\n| CMBS | 556 | 530 | -26 | 3,962 | 3,373 | -589 | 4,518 | 3,903 | -615 |\n| Corporate | 5,533 | 5,329 | -199 | 4,017 | 3,435 | -548 | 9,550 | 8,764 | -747 |\n| Foreign govt./govt. agencies | 356 | 349 | -7 | 78 | 68 | -10 | 434 | 417 | -17 |\n| Municipal | 7,485 | 7,173 | -312 | 1,046 | 863 | -183 | 8,531 | 8,036 | -495 |\n| RMBS | 1,744 | 1,702 | -42 | 1,567 | 1,147 | -420 | 3,311 | 2,849 | -462 |\n| U.S. Treasuries | 2,436 | 2,321 | -115 | 158 | 119 | -39 | 2,594 | 2,440 | -154 |\n| Total fixed maturities | 18,733 | 17,987 | -741 | 14,962 | 12,305 | -2,623 | 33,695 | 30,292 | -3,364 |\n| Equity securities | 53 | 52 | -1 | 637 | 506 | -131 | 690 | 558 | -132 |\n| Total securities in an unrealized loss | $18,786 | $18,039 | $-742 | $15,599 | $12,811 | $-2,754 | $34,385 | $30,850 | $-3,496 |\n[1] Unrealized losses exclude the change in fair value of bifurcated embedded derivative features of certain securities.\nSubsequent changes in fair value are recorded in net realized capital gains (losses).\nAs of December 31, 2011, AFS securities in an unrealized loss position, comprised of 2,549 securities, primarily related to corporate securities within the financial services sector, CMBS, and RMBS which have experienced significant price deterioration.\nAs of December 31, 2011, 75% of these securities were depressed less than 20% of cost or amortized cost.\nThe decline in unrealized losses during 2011 was primarily attributable to a decline in interest rates, partially offset by credit spread widening.\nMost of the securities depressed for twelve months or more relate to structured securities with exposure to commercial and residential real estate, as well as certain floating rate corporate securities or those securities with greater than 10 years to maturity, concentrated in the financial services sector.\nCurrent market spreads continue to be significantly wider for structured securities with exposure to commercial and residential real estate, as compared to spreads at the security’s respective purchase date, largely due to the economic and market uncertainties regarding future performance of commercial and residential real estate.\nIn addition, the majority of securities have a floating-rate coupon referenced to a market index where rates have declined substantially.\nThe Company neither has an intention to sell nor does it expect to be required to sell the securities outlined above.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.3617
cosine_accuracy@3 0.5194
cosine_accuracy@5 0.6092
cosine_accuracy@10 0.7015
cosine_precision@1 0.3617
cosine_precision@3 0.1788
cosine_precision@5 0.1267
cosine_precision@10 0.0752
cosine_recall@1 0.331
cosine_recall@3 0.4768
cosine_recall@5 0.5614
cosine_recall@10 0.6548
cosine_ndcg@10 0.496
cosine_mrr@10 0.4668
cosine_map@100 0.4482
dot_accuracy@1 0.3325
dot_accuracy@3 0.5243
dot_accuracy@5 0.5922
dot_accuracy@10 0.6748
dot_precision@1 0.3325
dot_precision@3 0.1796
dot_precision@5 0.1248
dot_precision@10 0.0726
dot_recall@1 0.3059
dot_recall@3 0.4762
dot_recall@5 0.5446
dot_recall@10 0.6273
dot_ndcg@10 0.4723
dot_mrr@10 0.4422
dot_map@100 0.4264

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,256 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 29 tokens
    • mean: 45.01 tokens
    • max: 121 tokens
    • min: 26 tokens
    • mean: 406.1 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    Instruct: Given a web search query, retrieve relevant passages that answer the query.
    Query: Title:
    Text: In the year with largest amount of Net credit losses, what's the amount of Revenues, net of interest expense and Total operating expenses? (in million)
    Title:
    Text: Comparison of Five-Year Cumulative Total Return The following graph compares the cumulative total return on Citigroup’s common stock with the S&P 500 Index and the S&P Financial Index over the five-year period extending through December31, 2009.
    The graph assumes that $100 was invested on December31, 2004 in Citigroup’s common stock, the S&P 500 Index and the S&P Financial Index and that all dividends were reinvested.
    Instruct: Given a web search query, retrieve relevant passages that answer the query.
    Query: Title:
    Text: what was the total of net earnings attributable to pmi in 2017?
    Title:
    Text: the fair value of the psu award at the date of grant is amortized to expense over the performance period , which is typically three years after the date of the award , or upon death , disability or reaching the age of 58 .
    as of december 31 , 2017 , pmi had $ 34 million of total unrecognized compensation cost related to non-vested psu awards .
    this cost is recognized over a weighted-average performance cycle period of two years , or upon death , disability or reaching the age of 58 .
    during the years ended december 31 , 2017 , and 2016 , there were no psu awards that vested .
    pmi did not grant any psu awards during note 10 .
    earnings per share : unvested share-based payment awards that contain non-forfeitable rights to dividends or dividend equivalents are participating securities and therefore are included in pmi 2019s earnings per share calculation pursuant to the two-class method .
    basic and diluted earnings per share ( 201ceps 201d ) were calculated using the following: .

    ( in millions )
    Instruct: Given a web search query, retrieve relevant passages that answer the query.
    Query: Title:
    Text: for the terrestar acquisition what will the final cash purchase price be in millions paid upon closing?
    Title:
    Text: dish network corporation notes to consolidated financial statements - continued this transaction was accounted for as a business combination using purchase price accounting .
    the allocation of the purchase consideration is in the table below .
    purchase allocation ( in thousands ) .

  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 2
  • fp16: True
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Evaluate_cosine_map@100
0 0 0.2566
1.0 141 0.3931
2.0 282 0.4482

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
316
Safetensors
Model size
434M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for thomaskim1130/stella_en_400M_v5-FinanceRAG

Finetuned
(8)
this model
Finetunes
1 model

Evaluation results