NeoBERT Cross-Encoder: Semantic Similarity (STS)
Cross encoders are high performing encoder models that compare two texts and output a 0-1 score.
I've found the cross-encoders/roberta-large-stsb
model to be very useful in creating evaluators for LLM outputs.
They're simple to use, fast and very accurate.
Features
- High performing: Achieves Pearson: 0.9124 and Spearman: 0.9087 on the STS-Benchmark test set.
- Efficient architecture: Based on the NeoBERT design (250M parameters), offering faster inference speeds.
- Extended context length: Processes sequences up to 4096 tokens, great for LLM output evals.
- Diversified training: Pretrained on
dleemiller/wiki-sim
and fine-tuned onsentence-transformers/stsb
.
Performance
Model | STS-B Test Pearson | STS-B Test Spearman | Context Length | Parameters | Speed |
---|---|---|---|---|---|
ModernCE-large-sts |
0.9256 | 0.9215 | 8192 | 395M | Medium |
ModernCE-base-sts |
0.9162 | 0.9122 | 8192 | 149M | Fast |
NeoCE-sts |
0.9124 | 0.9087 | 4096 | 250M | Fast |
stsb-roberta-large |
0.9147 | - | 512 | 355M | Slow |
stsb-distilroberta-base |
0.8792 | - | 512 | 82M | Fast |
Usage
To use NeoCE for semantic similarity tasks, you can load the model with the Hugging Face sentence-transformers
library:
from sentence_transformers import CrossEncoder
# Load NeoCE model
model = CrossEncoder("dleemiller/NeoCE-sts")
# Predict similarity scores for sentence pairs
sentence_pairs = [
("It's a wonderful day outside.", "It's so sunny today!"),
("It's a wonderful day outside.", "He drove to work earlier."),
]
scores = model.predict(sentence_pairs)
print(scores) # Outputs: array([0.9184, 0.0123], dtype=float32)
Output
The model returns similarity scores in the range [0, 1]
, where higher scores indicate stronger semantic similarity.
Training Details
Pretraining
The model was pretrained on the pair-score-sampled
subset of the dleemiller/wiki-sim
dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences.
- Classifier Dropout: a somewhat large classifier dropout of 0.3, to reduce overreliance on teacher scores.
- Objective: STS-B scores from
cross-encoder/stsb-roberta-large
.
Fine-Tuning
Fine-tuning was performed on the sentence-transformers/stsb
dataset.
Model Card
- Architecture: NeoBERT
- Pretraining Data:
dleemiller/wiki-sim (pair-score-sampled)
- Fine-Tuning Data:
sentence-transformers/stsb
Thank You
Thanks to the chandra-lab team for providing the NeoBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
Citation
If you use this model in your research, please cite:
@misc{moderncestsb2025,
author = {Miller, D. Lee},
title = {NeoCE STS: An STS cross encoder model},
year = {2025},
publisher = {Hugging Face Hub},
url = {https://huggingface.co./dleemiller/ModernCE-base-sts},
}
License
This model is licensed under the MIT License.
- Downloads last month
- 27
Model tree for dleemiller/NeoCE-sts
Base model
chandar-lab/NeoBERTDatasets used to train dleemiller/NeoCE-sts
Evaluation results
- Pearson Cosine on sts devself-reported0.921
- Spearman Cosine on sts devself-reported0.921
- Pearson Cosine on sts testself-reported0.912
- Spearman Cosine on sts testself-reported0.909