xiaobu-embedding
模型:基于GTE模型[1]多任务微调。
数据:闲聊类Query-Query、知识类Query-Doc、BGE开源Query-Doc[2];清洗正例,挖掘中等难度负例;累计6M(质量更重要)。
Usage (Sentence-Transformers)
pip install -U sentence-transformers
相似度计算:
from sentence_transformers import SentenceTransformer
sentences_1 = ["样例数据-1", "样例数据-2"]
sentences_2 = ["样例数据-3", "样例数据-4"]
model = SentenceTransformer('lier007/xiaobu-embedding')
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
Evaluation
参考BGE中文CMTEB评估[2]
Finetune
参考BGE微调模块[2]
Reference
- Downloads last month
- 315
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Spaces using lier007/xiaobu-embedding 4
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported49.379
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported54.847
- euclidean_pearson on MTEB AFQMCvalidation set self-reported53.050
- euclidean_spearman on MTEB AFQMCvalidation set self-reported54.848
- manhattan_pearson on MTEB AFQMCvalidation set self-reported53.063
- manhattan_spearman on MTEB AFQMCvalidation set self-reported54.874
- cos_sim_pearson on MTEB ATECtest set self-reported48.160
- cos_sim_spearman on MTEB ATECtest set self-reported55.132
- euclidean_pearson on MTEB ATECtest set self-reported55.436
- euclidean_spearman on MTEB ATECtest set self-reported55.132