Jamie@TitanML bwang0911 commited on
Commit
c39622b
0 Parent(s):

Duplicate from jinaai/jina-embeddings-v2-base-en

Browse files

Co-authored-by: Bo Wang <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false
7
+ }
README.md ADDED
@@ -0,0 +1,2809 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - feature-extraction
5
+ - sentence-similarity
6
+ - mteb
7
+ datasets:
8
+ - allenai/c4
9
+ language: en
10
+ inference: false
11
+ license: apache-2.0
12
+ model-index:
13
+ - name: jina-embedding-b-en-v2
14
+ results:
15
+ - task:
16
+ type: Classification
17
+ dataset:
18
+ type: mteb/amazon_counterfactual
19
+ name: MTEB AmazonCounterfactualClassification (en)
20
+ config: en
21
+ split: test
22
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
23
+ metrics:
24
+ - type: accuracy
25
+ value: 74.73134328358209
26
+ - type: ap
27
+ value: 37.765427081831035
28
+ - type: f1
29
+ value: 68.79367444339518
30
+ - task:
31
+ type: Classification
32
+ dataset:
33
+ type: mteb/amazon_polarity
34
+ name: MTEB AmazonPolarityClassification
35
+ config: default
36
+ split: test
37
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
38
+ metrics:
39
+ - type: accuracy
40
+ value: 88.544275
41
+ - type: ap
42
+ value: 84.61328675662887
43
+ - type: f1
44
+ value: 88.51879035862375
45
+ - task:
46
+ type: Classification
47
+ dataset:
48
+ type: mteb/amazon_reviews_multi
49
+ name: MTEB AmazonReviewsClassification (en)
50
+ config: en
51
+ split: test
52
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
53
+ metrics:
54
+ - type: accuracy
55
+ value: 45.263999999999996
56
+ - type: f1
57
+ value: 43.778759656699435
58
+ - task:
59
+ type: Retrieval
60
+ dataset:
61
+ type: arguana
62
+ name: MTEB ArguAna
63
+ config: default
64
+ split: test
65
+ revision: None
66
+ metrics:
67
+ - type: map_at_1
68
+ value: 21.693
69
+ - type: map_at_10
70
+ value: 35.487
71
+ - type: map_at_100
72
+ value: 36.862
73
+ - type: map_at_1000
74
+ value: 36.872
75
+ - type: map_at_3
76
+ value: 30.049999999999997
77
+ - type: map_at_5
78
+ value: 32.966
79
+ - type: mrr_at_1
80
+ value: 21.977
81
+ - type: mrr_at_10
82
+ value: 35.565999999999995
83
+ - type: mrr_at_100
84
+ value: 36.948
85
+ - type: mrr_at_1000
86
+ value: 36.958
87
+ - type: mrr_at_3
88
+ value: 30.121
89
+ - type: mrr_at_5
90
+ value: 33.051
91
+ - type: ndcg_at_1
92
+ value: 21.693
93
+ - type: ndcg_at_10
94
+ value: 44.181
95
+ - type: ndcg_at_100
96
+ value: 49.982
97
+ - type: ndcg_at_1000
98
+ value: 50.233000000000004
99
+ - type: ndcg_at_3
100
+ value: 32.830999999999996
101
+ - type: ndcg_at_5
102
+ value: 38.080000000000005
103
+ - type: precision_at_1
104
+ value: 21.693
105
+ - type: precision_at_10
106
+ value: 7.248
107
+ - type: precision_at_100
108
+ value: 0.9769999999999999
109
+ - type: precision_at_1000
110
+ value: 0.1
111
+ - type: precision_at_3
112
+ value: 13.632
113
+ - type: precision_at_5
114
+ value: 10.725
115
+ - type: recall_at_1
116
+ value: 21.693
117
+ - type: recall_at_10
118
+ value: 72.475
119
+ - type: recall_at_100
120
+ value: 97.653
121
+ - type: recall_at_1000
122
+ value: 99.57300000000001
123
+ - type: recall_at_3
124
+ value: 40.896
125
+ - type: recall_at_5
126
+ value: 53.627
127
+ - task:
128
+ type: Clustering
129
+ dataset:
130
+ type: mteb/arxiv-clustering-p2p
131
+ name: MTEB ArxivClusteringP2P
132
+ config: default
133
+ split: test
134
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
135
+ metrics:
136
+ - type: v_measure
137
+ value: 45.39242428696777
138
+ - task:
139
+ type: Clustering
140
+ dataset:
141
+ type: mteb/arxiv-clustering-s2s
142
+ name: MTEB ArxivClusteringS2S
143
+ config: default
144
+ split: test
145
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
146
+ metrics:
147
+ - type: v_measure
148
+ value: 36.675626784714
149
+ - task:
150
+ type: Reranking
151
+ dataset:
152
+ type: mteb/askubuntudupquestions-reranking
153
+ name: MTEB AskUbuntuDupQuestions
154
+ config: default
155
+ split: test
156
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
157
+ metrics:
158
+ - type: map
159
+ value: 62.247725694904034
160
+ - type: mrr
161
+ value: 74.91359978894604
162
+ - task:
163
+ type: STS
164
+ dataset:
165
+ type: mteb/biosses-sts
166
+ name: MTEB BIOSSES
167
+ config: default
168
+ split: test
169
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
170
+ metrics:
171
+ - type: cos_sim_pearson
172
+ value: 82.68003802970496
173
+ - type: cos_sim_spearman
174
+ value: 81.23438110096286
175
+ - type: euclidean_pearson
176
+ value: 81.87462986142582
177
+ - type: euclidean_spearman
178
+ value: 81.23438110096286
179
+ - type: manhattan_pearson
180
+ value: 81.61162566600755
181
+ - type: manhattan_spearman
182
+ value: 81.11329400456184
183
+ - task:
184
+ type: Classification
185
+ dataset:
186
+ type: mteb/banking77
187
+ name: MTEB Banking77Classification
188
+ config: default
189
+ split: test
190
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
191
+ metrics:
192
+ - type: accuracy
193
+ value: 84.01298701298701
194
+ - type: f1
195
+ value: 83.31690714969382
196
+ - task:
197
+ type: Clustering
198
+ dataset:
199
+ type: mteb/biorxiv-clustering-p2p
200
+ name: MTEB BiorxivClusteringP2P
201
+ config: default
202
+ split: test
203
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
204
+ metrics:
205
+ - type: v_measure
206
+ value: 37.050108150972086
207
+ - task:
208
+ type: Clustering
209
+ dataset:
210
+ type: mteb/biorxiv-clustering-s2s
211
+ name: MTEB BiorxivClusteringS2S
212
+ config: default
213
+ split: test
214
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
215
+ metrics:
216
+ - type: v_measure
217
+ value: 30.15731442819715
218
+ - task:
219
+ type: Retrieval
220
+ dataset:
221
+ type: BeIR/cqadupstack
222
+ name: MTEB CQADupstackAndroidRetrieval
223
+ config: default
224
+ split: test
225
+ revision: None
226
+ metrics:
227
+ - type: map_at_1
228
+ value: 31.391999999999996
229
+ - type: map_at_10
230
+ value: 42.597
231
+ - type: map_at_100
232
+ value: 44.07
233
+ - type: map_at_1000
234
+ value: 44.198
235
+ - type: map_at_3
236
+ value: 38.957
237
+ - type: map_at_5
238
+ value: 40.961
239
+ - type: mrr_at_1
240
+ value: 37.196
241
+ - type: mrr_at_10
242
+ value: 48.152
243
+ - type: mrr_at_100
244
+ value: 48.928
245
+ - type: mrr_at_1000
246
+ value: 48.964999999999996
247
+ - type: mrr_at_3
248
+ value: 45.446
249
+ - type: mrr_at_5
250
+ value: 47.205999999999996
251
+ - type: ndcg_at_1
252
+ value: 37.196
253
+ - type: ndcg_at_10
254
+ value: 49.089
255
+ - type: ndcg_at_100
256
+ value: 54.471000000000004
257
+ - type: ndcg_at_1000
258
+ value: 56.385
259
+ - type: ndcg_at_3
260
+ value: 43.699
261
+ - type: ndcg_at_5
262
+ value: 46.22
263
+ - type: precision_at_1
264
+ value: 37.196
265
+ - type: precision_at_10
266
+ value: 9.313
267
+ - type: precision_at_100
268
+ value: 1.478
269
+ - type: precision_at_1000
270
+ value: 0.198
271
+ - type: precision_at_3
272
+ value: 20.839
273
+ - type: precision_at_5
274
+ value: 14.936
275
+ - type: recall_at_1
276
+ value: 31.391999999999996
277
+ - type: recall_at_10
278
+ value: 61.876
279
+ - type: recall_at_100
280
+ value: 84.214
281
+ - type: recall_at_1000
282
+ value: 95.985
283
+ - type: recall_at_3
284
+ value: 46.6
285
+ - type: recall_at_5
286
+ value: 53.588
287
+ - task:
288
+ type: Retrieval
289
+ dataset:
290
+ type: BeIR/cqadupstack
291
+ name: MTEB CQADupstackEnglishRetrieval
292
+ config: default
293
+ split: test
294
+ revision: None
295
+ metrics:
296
+ - type: map_at_1
297
+ value: 29.083
298
+ - type: map_at_10
299
+ value: 38.812999999999995
300
+ - type: map_at_100
301
+ value: 40.053
302
+ - type: map_at_1000
303
+ value: 40.188
304
+ - type: map_at_3
305
+ value: 36.111
306
+ - type: map_at_5
307
+ value: 37.519000000000005
308
+ - type: mrr_at_1
309
+ value: 36.497
310
+ - type: mrr_at_10
311
+ value: 44.85
312
+ - type: mrr_at_100
313
+ value: 45.546
314
+ - type: mrr_at_1000
315
+ value: 45.593
316
+ - type: mrr_at_3
317
+ value: 42.686
318
+ - type: mrr_at_5
319
+ value: 43.909
320
+ - type: ndcg_at_1
321
+ value: 36.497
322
+ - type: ndcg_at_10
323
+ value: 44.443
324
+ - type: ndcg_at_100
325
+ value: 48.979
326
+ - type: ndcg_at_1000
327
+ value: 51.154999999999994
328
+ - type: ndcg_at_3
329
+ value: 40.660000000000004
330
+ - type: ndcg_at_5
331
+ value: 42.193000000000005
332
+ - type: precision_at_1
333
+ value: 36.497
334
+ - type: precision_at_10
335
+ value: 8.433
336
+ - type: precision_at_100
337
+ value: 1.369
338
+ - type: precision_at_1000
339
+ value: 0.185
340
+ - type: precision_at_3
341
+ value: 19.894000000000002
342
+ - type: precision_at_5
343
+ value: 13.873
344
+ - type: recall_at_1
345
+ value: 29.083
346
+ - type: recall_at_10
347
+ value: 54.313
348
+ - type: recall_at_100
349
+ value: 73.792
350
+ - type: recall_at_1000
351
+ value: 87.629
352
+ - type: recall_at_3
353
+ value: 42.257
354
+ - type: recall_at_5
355
+ value: 47.066
356
+ - task:
357
+ type: Retrieval
358
+ dataset:
359
+ type: BeIR/cqadupstack
360
+ name: MTEB CQADupstackGamingRetrieval
361
+ config: default
362
+ split: test
363
+ revision: None
364
+ metrics:
365
+ - type: map_at_1
366
+ value: 38.556000000000004
367
+ - type: map_at_10
368
+ value: 50.698
369
+ - type: map_at_100
370
+ value: 51.705
371
+ - type: map_at_1000
372
+ value: 51.768
373
+ - type: map_at_3
374
+ value: 47.848
375
+ - type: map_at_5
376
+ value: 49.358000000000004
377
+ - type: mrr_at_1
378
+ value: 43.95
379
+ - type: mrr_at_10
380
+ value: 54.191
381
+ - type: mrr_at_100
382
+ value: 54.852999999999994
383
+ - type: mrr_at_1000
384
+ value: 54.885
385
+ - type: mrr_at_3
386
+ value: 51.954
387
+ - type: mrr_at_5
388
+ value: 53.13
389
+ - type: ndcg_at_1
390
+ value: 43.95
391
+ - type: ndcg_at_10
392
+ value: 56.516
393
+ - type: ndcg_at_100
394
+ value: 60.477000000000004
395
+ - type: ndcg_at_1000
396
+ value: 61.746
397
+ - type: ndcg_at_3
398
+ value: 51.601
399
+ - type: ndcg_at_5
400
+ value: 53.795
401
+ - type: precision_at_1
402
+ value: 43.95
403
+ - type: precision_at_10
404
+ value: 9.009
405
+ - type: precision_at_100
406
+ value: 1.189
407
+ - type: precision_at_1000
408
+ value: 0.135
409
+ - type: precision_at_3
410
+ value: 22.989
411
+ - type: precision_at_5
412
+ value: 15.473
413
+ - type: recall_at_1
414
+ value: 38.556000000000004
415
+ - type: recall_at_10
416
+ value: 70.159
417
+ - type: recall_at_100
418
+ value: 87.132
419
+ - type: recall_at_1000
420
+ value: 96.16
421
+ - type: recall_at_3
422
+ value: 56.906
423
+ - type: recall_at_5
424
+ value: 62.332
425
+ - task:
426
+ type: Retrieval
427
+ dataset:
428
+ type: BeIR/cqadupstack
429
+ name: MTEB CQADupstackGisRetrieval
430
+ config: default
431
+ split: test
432
+ revision: None
433
+ metrics:
434
+ - type: map_at_1
435
+ value: 24.238
436
+ - type: map_at_10
437
+ value: 32.5
438
+ - type: map_at_100
439
+ value: 33.637
440
+ - type: map_at_1000
441
+ value: 33.719
442
+ - type: map_at_3
443
+ value: 30.026999999999997
444
+ - type: map_at_5
445
+ value: 31.555
446
+ - type: mrr_at_1
447
+ value: 26.328000000000003
448
+ - type: mrr_at_10
449
+ value: 34.44
450
+ - type: mrr_at_100
451
+ value: 35.455999999999996
452
+ - type: mrr_at_1000
453
+ value: 35.521
454
+ - type: mrr_at_3
455
+ value: 32.034
456
+ - type: mrr_at_5
457
+ value: 33.565
458
+ - type: ndcg_at_1
459
+ value: 26.328000000000003
460
+ - type: ndcg_at_10
461
+ value: 37.202
462
+ - type: ndcg_at_100
463
+ value: 42.728
464
+ - type: ndcg_at_1000
465
+ value: 44.792
466
+ - type: ndcg_at_3
467
+ value: 32.368
468
+ - type: ndcg_at_5
469
+ value: 35.008
470
+ - type: precision_at_1
471
+ value: 26.328000000000003
472
+ - type: precision_at_10
473
+ value: 5.7059999999999995
474
+ - type: precision_at_100
475
+ value: 0.8880000000000001
476
+ - type: precision_at_1000
477
+ value: 0.11100000000000002
478
+ - type: precision_at_3
479
+ value: 13.672
480
+ - type: precision_at_5
481
+ value: 9.74
482
+ - type: recall_at_1
483
+ value: 24.238
484
+ - type: recall_at_10
485
+ value: 49.829
486
+ - type: recall_at_100
487
+ value: 75.21
488
+ - type: recall_at_1000
489
+ value: 90.521
490
+ - type: recall_at_3
491
+ value: 36.867
492
+ - type: recall_at_5
493
+ value: 43.241
494
+ - task:
495
+ type: Retrieval
496
+ dataset:
497
+ type: BeIR/cqadupstack
498
+ name: MTEB CQADupstackMathematicaRetrieval
499
+ config: default
500
+ split: test
501
+ revision: None
502
+ metrics:
503
+ - type: map_at_1
504
+ value: 15.378
505
+ - type: map_at_10
506
+ value: 22.817999999999998
507
+ - type: map_at_100
508
+ value: 23.977999999999998
509
+ - type: map_at_1000
510
+ value: 24.108
511
+ - type: map_at_3
512
+ value: 20.719
513
+ - type: map_at_5
514
+ value: 21.889
515
+ - type: mrr_at_1
516
+ value: 19.03
517
+ - type: mrr_at_10
518
+ value: 27.022000000000002
519
+ - type: mrr_at_100
520
+ value: 28.011999999999997
521
+ - type: mrr_at_1000
522
+ value: 28.096
523
+ - type: mrr_at_3
524
+ value: 24.855
525
+ - type: mrr_at_5
526
+ value: 26.029999999999998
527
+ - type: ndcg_at_1
528
+ value: 19.03
529
+ - type: ndcg_at_10
530
+ value: 27.526
531
+ - type: ndcg_at_100
532
+ value: 33.040000000000006
533
+ - type: ndcg_at_1000
534
+ value: 36.187000000000005
535
+ - type: ndcg_at_3
536
+ value: 23.497
537
+ - type: ndcg_at_5
538
+ value: 25.334
539
+ - type: precision_at_1
540
+ value: 19.03
541
+ - type: precision_at_10
542
+ value: 4.963
543
+ - type: precision_at_100
544
+ value: 0.893
545
+ - type: precision_at_1000
546
+ value: 0.13
547
+ - type: precision_at_3
548
+ value: 11.360000000000001
549
+ - type: precision_at_5
550
+ value: 8.134
551
+ - type: recall_at_1
552
+ value: 15.378
553
+ - type: recall_at_10
554
+ value: 38.061
555
+ - type: recall_at_100
556
+ value: 61.754
557
+ - type: recall_at_1000
558
+ value: 84.259
559
+ - type: recall_at_3
560
+ value: 26.788
561
+ - type: recall_at_5
562
+ value: 31.326999999999998
563
+ - task:
564
+ type: Retrieval
565
+ dataset:
566
+ type: BeIR/cqadupstack
567
+ name: MTEB CQADupstackPhysicsRetrieval
568
+ config: default
569
+ split: test
570
+ revision: None
571
+ metrics:
572
+ - type: map_at_1
573
+ value: 27.511999999999997
574
+ - type: map_at_10
575
+ value: 37.429
576
+ - type: map_at_100
577
+ value: 38.818000000000005
578
+ - type: map_at_1000
579
+ value: 38.924
580
+ - type: map_at_3
581
+ value: 34.625
582
+ - type: map_at_5
583
+ value: 36.064
584
+ - type: mrr_at_1
585
+ value: 33.300999999999995
586
+ - type: mrr_at_10
587
+ value: 43.036
588
+ - type: mrr_at_100
589
+ value: 43.894
590
+ - type: mrr_at_1000
591
+ value: 43.936
592
+ - type: mrr_at_3
593
+ value: 40.825
594
+ - type: mrr_at_5
595
+ value: 42.028
596
+ - type: ndcg_at_1
597
+ value: 33.300999999999995
598
+ - type: ndcg_at_10
599
+ value: 43.229
600
+ - type: ndcg_at_100
601
+ value: 48.992000000000004
602
+ - type: ndcg_at_1000
603
+ value: 51.02100000000001
604
+ - type: ndcg_at_3
605
+ value: 38.794000000000004
606
+ - type: ndcg_at_5
607
+ value: 40.65
608
+ - type: precision_at_1
609
+ value: 33.300999999999995
610
+ - type: precision_at_10
611
+ value: 7.777000000000001
612
+ - type: precision_at_100
613
+ value: 1.269
614
+ - type: precision_at_1000
615
+ value: 0.163
616
+ - type: precision_at_3
617
+ value: 18.351
618
+ - type: precision_at_5
619
+ value: 12.762
620
+ - type: recall_at_1
621
+ value: 27.511999999999997
622
+ - type: recall_at_10
623
+ value: 54.788000000000004
624
+ - type: recall_at_100
625
+ value: 79.105
626
+ - type: recall_at_1000
627
+ value: 92.49199999999999
628
+ - type: recall_at_3
629
+ value: 41.924
630
+ - type: recall_at_5
631
+ value: 47.026
632
+ - task:
633
+ type: Retrieval
634
+ dataset:
635
+ type: BeIR/cqadupstack
636
+ name: MTEB CQADupstackProgrammersRetrieval
637
+ config: default
638
+ split: test
639
+ revision: None
640
+ metrics:
641
+ - type: map_at_1
642
+ value: 24.117
643
+ - type: map_at_10
644
+ value: 33.32
645
+ - type: map_at_100
646
+ value: 34.677
647
+ - type: map_at_1000
648
+ value: 34.78
649
+ - type: map_at_3
650
+ value: 30.233999999999998
651
+ - type: map_at_5
652
+ value: 31.668000000000003
653
+ - type: mrr_at_1
654
+ value: 29.566
655
+ - type: mrr_at_10
656
+ value: 38.244
657
+ - type: mrr_at_100
658
+ value: 39.245000000000005
659
+ - type: mrr_at_1000
660
+ value: 39.296
661
+ - type: mrr_at_3
662
+ value: 35.864000000000004
663
+ - type: mrr_at_5
664
+ value: 36.919999999999995
665
+ - type: ndcg_at_1
666
+ value: 29.566
667
+ - type: ndcg_at_10
668
+ value: 39.127
669
+ - type: ndcg_at_100
670
+ value: 44.989000000000004
671
+ - type: ndcg_at_1000
672
+ value: 47.189
673
+ - type: ndcg_at_3
674
+ value: 34.039
675
+ - type: ndcg_at_5
676
+ value: 35.744
677
+ - type: precision_at_1
678
+ value: 29.566
679
+ - type: precision_at_10
680
+ value: 7.385999999999999
681
+ - type: precision_at_100
682
+ value: 1.204
683
+ - type: precision_at_1000
684
+ value: 0.158
685
+ - type: precision_at_3
686
+ value: 16.286
687
+ - type: precision_at_5
688
+ value: 11.484
689
+ - type: recall_at_1
690
+ value: 24.117
691
+ - type: recall_at_10
692
+ value: 51.559999999999995
693
+ - type: recall_at_100
694
+ value: 77.104
695
+ - type: recall_at_1000
696
+ value: 91.79899999999999
697
+ - type: recall_at_3
698
+ value: 36.82
699
+ - type: recall_at_5
700
+ value: 41.453
701
+ - task:
702
+ type: Retrieval
703
+ dataset:
704
+ type: BeIR/cqadupstack
705
+ name: MTEB CQADupstackRetrieval
706
+ config: default
707
+ split: test
708
+ revision: None
709
+ metrics:
710
+ - type: map_at_1
711
+ value: 25.17625
712
+ - type: map_at_10
713
+ value: 34.063916666666664
714
+ - type: map_at_100
715
+ value: 35.255500000000005
716
+ - type: map_at_1000
717
+ value: 35.37275
718
+ - type: map_at_3
719
+ value: 31.351666666666667
720
+ - type: map_at_5
721
+ value: 32.80608333333333
722
+ - type: mrr_at_1
723
+ value: 29.59783333333333
724
+ - type: mrr_at_10
725
+ value: 38.0925
726
+ - type: mrr_at_100
727
+ value: 38.957249999999995
728
+ - type: mrr_at_1000
729
+ value: 39.01608333333333
730
+ - type: mrr_at_3
731
+ value: 35.77625
732
+ - type: mrr_at_5
733
+ value: 37.04991666666667
734
+ - type: ndcg_at_1
735
+ value: 29.59783333333333
736
+ - type: ndcg_at_10
737
+ value: 39.343666666666664
738
+ - type: ndcg_at_100
739
+ value: 44.488249999999994
740
+ - type: ndcg_at_1000
741
+ value: 46.83358333333334
742
+ - type: ndcg_at_3
743
+ value: 34.69708333333333
744
+ - type: ndcg_at_5
745
+ value: 36.75075
746
+ - type: precision_at_1
747
+ value: 29.59783333333333
748
+ - type: precision_at_10
749
+ value: 6.884083333333332
750
+ - type: precision_at_100
751
+ value: 1.114
752
+ - type: precision_at_1000
753
+ value: 0.15108333333333332
754
+ - type: precision_at_3
755
+ value: 15.965250000000003
756
+ - type: precision_at_5
757
+ value: 11.246500000000001
758
+ - type: recall_at_1
759
+ value: 25.17625
760
+ - type: recall_at_10
761
+ value: 51.015999999999984
762
+ - type: recall_at_100
763
+ value: 73.60174999999998
764
+ - type: recall_at_1000
765
+ value: 89.849
766
+ - type: recall_at_3
767
+ value: 37.88399999999999
768
+ - type: recall_at_5
769
+ value: 43.24541666666666
770
+ - task:
771
+ type: Retrieval
772
+ dataset:
773
+ type: BeIR/cqadupstack
774
+ name: MTEB CQADupstackStatsRetrieval
775
+ config: default
776
+ split: test
777
+ revision: None
778
+ metrics:
779
+ - type: map_at_1
780
+ value: 24.537
781
+ - type: map_at_10
782
+ value: 31.081999999999997
783
+ - type: map_at_100
784
+ value: 32.042
785
+ - type: map_at_1000
786
+ value: 32.141
787
+ - type: map_at_3
788
+ value: 29.137
789
+ - type: map_at_5
790
+ value: 30.079
791
+ - type: mrr_at_1
792
+ value: 27.454
793
+ - type: mrr_at_10
794
+ value: 33.694
795
+ - type: mrr_at_100
796
+ value: 34.579
797
+ - type: mrr_at_1000
798
+ value: 34.649
799
+ - type: mrr_at_3
800
+ value: 32.004
801
+ - type: mrr_at_5
802
+ value: 32.794000000000004
803
+ - type: ndcg_at_1
804
+ value: 27.454
805
+ - type: ndcg_at_10
806
+ value: 34.915
807
+ - type: ndcg_at_100
808
+ value: 39.641
809
+ - type: ndcg_at_1000
810
+ value: 42.105
811
+ - type: ndcg_at_3
812
+ value: 31.276
813
+ - type: ndcg_at_5
814
+ value: 32.65
815
+ - type: precision_at_1
816
+ value: 27.454
817
+ - type: precision_at_10
818
+ value: 5.337
819
+ - type: precision_at_100
820
+ value: 0.8250000000000001
821
+ - type: precision_at_1000
822
+ value: 0.11199999999999999
823
+ - type: precision_at_3
824
+ value: 13.241
825
+ - type: precision_at_5
826
+ value: 8.895999999999999
827
+ - type: recall_at_1
828
+ value: 24.537
829
+ - type: recall_at_10
830
+ value: 44.324999999999996
831
+ - type: recall_at_100
832
+ value: 65.949
833
+ - type: recall_at_1000
834
+ value: 84.017
835
+ - type: recall_at_3
836
+ value: 33.857
837
+ - type: recall_at_5
838
+ value: 37.316
839
+ - task:
840
+ type: Retrieval
841
+ dataset:
842
+ type: BeIR/cqadupstack
843
+ name: MTEB CQADupstackTexRetrieval
844
+ config: default
845
+ split: test
846
+ revision: None
847
+ metrics:
848
+ - type: map_at_1
849
+ value: 17.122
850
+ - type: map_at_10
851
+ value: 24.32
852
+ - type: map_at_100
853
+ value: 25.338
854
+ - type: map_at_1000
855
+ value: 25.462
856
+ - type: map_at_3
857
+ value: 22.064
858
+ - type: map_at_5
859
+ value: 23.322000000000003
860
+ - type: mrr_at_1
861
+ value: 20.647
862
+ - type: mrr_at_10
863
+ value: 27.858
864
+ - type: mrr_at_100
865
+ value: 28.743999999999996
866
+ - type: mrr_at_1000
867
+ value: 28.819
868
+ - type: mrr_at_3
869
+ value: 25.769
870
+ - type: mrr_at_5
871
+ value: 26.964
872
+ - type: ndcg_at_1
873
+ value: 20.647
874
+ - type: ndcg_at_10
875
+ value: 28.849999999999998
876
+ - type: ndcg_at_100
877
+ value: 33.849000000000004
878
+ - type: ndcg_at_1000
879
+ value: 36.802
880
+ - type: ndcg_at_3
881
+ value: 24.799
882
+ - type: ndcg_at_5
883
+ value: 26.682
884
+ - type: precision_at_1
885
+ value: 20.647
886
+ - type: precision_at_10
887
+ value: 5.2170000000000005
888
+ - type: precision_at_100
889
+ value: 0.906
890
+ - type: precision_at_1000
891
+ value: 0.134
892
+ - type: precision_at_3
893
+ value: 11.769
894
+ - type: precision_at_5
895
+ value: 8.486
896
+ - type: recall_at_1
897
+ value: 17.122
898
+ - type: recall_at_10
899
+ value: 38.999
900
+ - type: recall_at_100
901
+ value: 61.467000000000006
902
+ - type: recall_at_1000
903
+ value: 82.716
904
+ - type: recall_at_3
905
+ value: 27.601
906
+ - type: recall_at_5
907
+ value: 32.471
908
+ - task:
909
+ type: Retrieval
910
+ dataset:
911
+ type: BeIR/cqadupstack
912
+ name: MTEB CQADupstackUnixRetrieval
913
+ config: default
914
+ split: test
915
+ revision: None
916
+ metrics:
917
+ - type: map_at_1
918
+ value: 24.396
919
+ - type: map_at_10
920
+ value: 33.415
921
+ - type: map_at_100
922
+ value: 34.521
923
+ - type: map_at_1000
924
+ value: 34.631
925
+ - type: map_at_3
926
+ value: 30.703999999999997
927
+ - type: map_at_5
928
+ value: 32.166
929
+ - type: mrr_at_1
930
+ value: 28.825
931
+ - type: mrr_at_10
932
+ value: 37.397000000000006
933
+ - type: mrr_at_100
934
+ value: 38.286
935
+ - type: mrr_at_1000
936
+ value: 38.346000000000004
937
+ - type: mrr_at_3
938
+ value: 35.028
939
+ - type: mrr_at_5
940
+ value: 36.32
941
+ - type: ndcg_at_1
942
+ value: 28.825
943
+ - type: ndcg_at_10
944
+ value: 38.656
945
+ - type: ndcg_at_100
946
+ value: 43.856
947
+ - type: ndcg_at_1000
948
+ value: 46.31
949
+ - type: ndcg_at_3
950
+ value: 33.793
951
+ - type: ndcg_at_5
952
+ value: 35.909
953
+ - type: precision_at_1
954
+ value: 28.825
955
+ - type: precision_at_10
956
+ value: 6.567
957
+ - type: precision_at_100
958
+ value: 1.0330000000000001
959
+ - type: precision_at_1000
960
+ value: 0.135
961
+ - type: precision_at_3
962
+ value: 15.516
963
+ - type: precision_at_5
964
+ value: 10.914
965
+ - type: recall_at_1
966
+ value: 24.396
967
+ - type: recall_at_10
968
+ value: 50.747
969
+ - type: recall_at_100
970
+ value: 73.477
971
+ - type: recall_at_1000
972
+ value: 90.801
973
+ - type: recall_at_3
974
+ value: 37.1
975
+ - type: recall_at_5
976
+ value: 42.589
977
+ - task:
978
+ type: Retrieval
979
+ dataset:
980
+ type: BeIR/cqadupstack
981
+ name: MTEB CQADupstackWebmastersRetrieval
982
+ config: default
983
+ split: test
984
+ revision: None
985
+ metrics:
986
+ - type: map_at_1
987
+ value: 25.072
988
+ - type: map_at_10
989
+ value: 34.307
990
+ - type: map_at_100
991
+ value: 35.725
992
+ - type: map_at_1000
993
+ value: 35.943999999999996
994
+ - type: map_at_3
995
+ value: 30.906
996
+ - type: map_at_5
997
+ value: 32.818000000000005
998
+ - type: mrr_at_1
999
+ value: 29.644
1000
+ - type: mrr_at_10
1001
+ value: 38.673
1002
+ - type: mrr_at_100
1003
+ value: 39.459
1004
+ - type: mrr_at_1000
1005
+ value: 39.527
1006
+ - type: mrr_at_3
1007
+ value: 35.771
1008
+ - type: mrr_at_5
1009
+ value: 37.332
1010
+ - type: ndcg_at_1
1011
+ value: 29.644
1012
+ - type: ndcg_at_10
1013
+ value: 40.548
1014
+ - type: ndcg_at_100
1015
+ value: 45.678999999999995
1016
+ - type: ndcg_at_1000
1017
+ value: 48.488
1018
+ - type: ndcg_at_3
1019
+ value: 34.887
1020
+ - type: ndcg_at_5
1021
+ value: 37.543
1022
+ - type: precision_at_1
1023
+ value: 29.644
1024
+ - type: precision_at_10
1025
+ value: 7.688000000000001
1026
+ - type: precision_at_100
1027
+ value: 1.482
1028
+ - type: precision_at_1000
1029
+ value: 0.23600000000000002
1030
+ - type: precision_at_3
1031
+ value: 16.206
1032
+ - type: precision_at_5
1033
+ value: 12.016
1034
+ - type: recall_at_1
1035
+ value: 25.072
1036
+ - type: recall_at_10
1037
+ value: 53.478
1038
+ - type: recall_at_100
1039
+ value: 76.07300000000001
1040
+ - type: recall_at_1000
1041
+ value: 93.884
1042
+ - type: recall_at_3
1043
+ value: 37.583
1044
+ - type: recall_at_5
1045
+ value: 44.464
1046
+ - task:
1047
+ type: Retrieval
1048
+ dataset:
1049
+ type: BeIR/cqadupstack
1050
+ name: MTEB CQADupstackWordpressRetrieval
1051
+ config: default
1052
+ split: test
1053
+ revision: None
1054
+ metrics:
1055
+ - type: map_at_1
1056
+ value: 20.712
1057
+ - type: map_at_10
1058
+ value: 27.467999999999996
1059
+ - type: map_at_100
1060
+ value: 28.502
1061
+ - type: map_at_1000
1062
+ value: 28.610000000000003
1063
+ - type: map_at_3
1064
+ value: 24.887999999999998
1065
+ - type: map_at_5
1066
+ value: 26.273999999999997
1067
+ - type: mrr_at_1
1068
+ value: 22.736
1069
+ - type: mrr_at_10
1070
+ value: 29.553
1071
+ - type: mrr_at_100
1072
+ value: 30.485
1073
+ - type: mrr_at_1000
1074
+ value: 30.56
1075
+ - type: mrr_at_3
1076
+ value: 27.078999999999997
1077
+ - type: mrr_at_5
1078
+ value: 28.401
1079
+ - type: ndcg_at_1
1080
+ value: 22.736
1081
+ - type: ndcg_at_10
1082
+ value: 32.023
1083
+ - type: ndcg_at_100
1084
+ value: 37.158
1085
+ - type: ndcg_at_1000
1086
+ value: 39.823
1087
+ - type: ndcg_at_3
1088
+ value: 26.951999999999998
1089
+ - type: ndcg_at_5
1090
+ value: 29.281000000000002
1091
+ - type: precision_at_1
1092
+ value: 22.736
1093
+ - type: precision_at_10
1094
+ value: 5.213
1095
+ - type: precision_at_100
1096
+ value: 0.832
1097
+ - type: precision_at_1000
1098
+ value: 0.116
1099
+ - type: precision_at_3
1100
+ value: 11.459999999999999
1101
+ - type: precision_at_5
1102
+ value: 8.244
1103
+ - type: recall_at_1
1104
+ value: 20.712
1105
+ - type: recall_at_10
1106
+ value: 44.057
1107
+ - type: recall_at_100
1108
+ value: 67.944
1109
+ - type: recall_at_1000
1110
+ value: 87.925
1111
+ - type: recall_at_3
1112
+ value: 30.305
1113
+ - type: recall_at_5
1114
+ value: 36.071999999999996
1115
+ - task:
1116
+ type: Retrieval
1117
+ dataset:
1118
+ type: climate-fever
1119
+ name: MTEB ClimateFEVER
1120
+ config: default
1121
+ split: test
1122
+ revision: None
1123
+ metrics:
1124
+ - type: map_at_1
1125
+ value: 10.181999999999999
1126
+ - type: map_at_10
1127
+ value: 16.66
1128
+ - type: map_at_100
1129
+ value: 18.273
1130
+ - type: map_at_1000
1131
+ value: 18.45
1132
+ - type: map_at_3
1133
+ value: 14.141
1134
+ - type: map_at_5
1135
+ value: 15.455
1136
+ - type: mrr_at_1
1137
+ value: 22.15
1138
+ - type: mrr_at_10
1139
+ value: 32.062000000000005
1140
+ - type: mrr_at_100
1141
+ value: 33.116
1142
+ - type: mrr_at_1000
1143
+ value: 33.168
1144
+ - type: mrr_at_3
1145
+ value: 28.827
1146
+ - type: mrr_at_5
1147
+ value: 30.892999999999997
1148
+ - type: ndcg_at_1
1149
+ value: 22.15
1150
+ - type: ndcg_at_10
1151
+ value: 23.532
1152
+ - type: ndcg_at_100
1153
+ value: 30.358
1154
+ - type: ndcg_at_1000
1155
+ value: 33.783
1156
+ - type: ndcg_at_3
1157
+ value: 19.222
1158
+ - type: ndcg_at_5
1159
+ value: 20.919999999999998
1160
+ - type: precision_at_1
1161
+ value: 22.15
1162
+ - type: precision_at_10
1163
+ value: 7.185999999999999
1164
+ - type: precision_at_100
1165
+ value: 1.433
1166
+ - type: precision_at_1000
1167
+ value: 0.207
1168
+ - type: precision_at_3
1169
+ value: 13.941
1170
+ - type: precision_at_5
1171
+ value: 10.906
1172
+ - type: recall_at_1
1173
+ value: 10.181999999999999
1174
+ - type: recall_at_10
1175
+ value: 28.104000000000003
1176
+ - type: recall_at_100
1177
+ value: 51.998999999999995
1178
+ - type: recall_at_1000
1179
+ value: 71.311
1180
+ - type: recall_at_3
1181
+ value: 17.698
1182
+ - type: recall_at_5
1183
+ value: 22.262999999999998
1184
+ - task:
1185
+ type: Retrieval
1186
+ dataset:
1187
+ type: dbpedia-entity
1188
+ name: MTEB DBPedia
1189
+ config: default
1190
+ split: test
1191
+ revision: None
1192
+ metrics:
1193
+ - type: map_at_1
1194
+ value: 6.669
1195
+ - type: map_at_10
1196
+ value: 15.552
1197
+ - type: map_at_100
1198
+ value: 21.865000000000002
1199
+ - type: map_at_1000
1200
+ value: 23.268
1201
+ - type: map_at_3
1202
+ value: 11.309
1203
+ - type: map_at_5
1204
+ value: 13.084000000000001
1205
+ - type: mrr_at_1
1206
+ value: 55.50000000000001
1207
+ - type: mrr_at_10
1208
+ value: 66.46600000000001
1209
+ - type: mrr_at_100
1210
+ value: 66.944
1211
+ - type: mrr_at_1000
1212
+ value: 66.956
1213
+ - type: mrr_at_3
1214
+ value: 64.542
1215
+ - type: mrr_at_5
1216
+ value: 65.717
1217
+ - type: ndcg_at_1
1218
+ value: 44.75
1219
+ - type: ndcg_at_10
1220
+ value: 35.049
1221
+ - type: ndcg_at_100
1222
+ value: 39.073
1223
+ - type: ndcg_at_1000
1224
+ value: 46.208
1225
+ - type: ndcg_at_3
1226
+ value: 39.525
1227
+ - type: ndcg_at_5
1228
+ value: 37.156
1229
+ - type: precision_at_1
1230
+ value: 55.50000000000001
1231
+ - type: precision_at_10
1232
+ value: 27.800000000000004
1233
+ - type: precision_at_100
1234
+ value: 9.013
1235
+ - type: precision_at_1000
1236
+ value: 1.8800000000000001
1237
+ - type: precision_at_3
1238
+ value: 42.667
1239
+ - type: precision_at_5
1240
+ value: 36.0
1241
+ - type: recall_at_1
1242
+ value: 6.669
1243
+ - type: recall_at_10
1244
+ value: 21.811
1245
+ - type: recall_at_100
1246
+ value: 45.112
1247
+ - type: recall_at_1000
1248
+ value: 67.806
1249
+ - type: recall_at_3
1250
+ value: 13.373
1251
+ - type: recall_at_5
1252
+ value: 16.615
1253
+ - task:
1254
+ type: Classification
1255
+ dataset:
1256
+ type: mteb/emotion
1257
+ name: MTEB EmotionClassification
1258
+ config: default
1259
+ split: test
1260
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
1261
+ metrics:
1262
+ - type: accuracy
1263
+ value: 48.769999999999996
1264
+ - type: f1
1265
+ value: 42.91448356376592
1266
+ - task:
1267
+ type: Retrieval
1268
+ dataset:
1269
+ type: fever
1270
+ name: MTEB FEVER
1271
+ config: default
1272
+ split: test
1273
+ revision: None
1274
+ metrics:
1275
+ - type: map_at_1
1276
+ value: 54.013
1277
+ - type: map_at_10
1278
+ value: 66.239
1279
+ - type: map_at_100
1280
+ value: 66.62599999999999
1281
+ - type: map_at_1000
1282
+ value: 66.644
1283
+ - type: map_at_3
1284
+ value: 63.965
1285
+ - type: map_at_5
1286
+ value: 65.45400000000001
1287
+ - type: mrr_at_1
1288
+ value: 58.221000000000004
1289
+ - type: mrr_at_10
1290
+ value: 70.43700000000001
1291
+ - type: mrr_at_100
1292
+ value: 70.744
1293
+ - type: mrr_at_1000
1294
+ value: 70.75099999999999
1295
+ - type: mrr_at_3
1296
+ value: 68.284
1297
+ - type: mrr_at_5
1298
+ value: 69.721
1299
+ - type: ndcg_at_1
1300
+ value: 58.221000000000004
1301
+ - type: ndcg_at_10
1302
+ value: 72.327
1303
+ - type: ndcg_at_100
1304
+ value: 73.953
1305
+ - type: ndcg_at_1000
1306
+ value: 74.312
1307
+ - type: ndcg_at_3
1308
+ value: 68.062
1309
+ - type: ndcg_at_5
1310
+ value: 70.56400000000001
1311
+ - type: precision_at_1
1312
+ value: 58.221000000000004
1313
+ - type: precision_at_10
1314
+ value: 9.521
1315
+ - type: precision_at_100
1316
+ value: 1.045
1317
+ - type: precision_at_1000
1318
+ value: 0.109
1319
+ - type: precision_at_3
1320
+ value: 27.348
1321
+ - type: precision_at_5
1322
+ value: 17.794999999999998
1323
+ - type: recall_at_1
1324
+ value: 54.013
1325
+ - type: recall_at_10
1326
+ value: 86.957
1327
+ - type: recall_at_100
1328
+ value: 93.911
1329
+ - type: recall_at_1000
1330
+ value: 96.38
1331
+ - type: recall_at_3
1332
+ value: 75.555
1333
+ - type: recall_at_5
1334
+ value: 81.671
1335
+ - task:
1336
+ type: Retrieval
1337
+ dataset:
1338
+ type: fiqa
1339
+ name: MTEB FiQA2018
1340
+ config: default
1341
+ split: test
1342
+ revision: None
1343
+ metrics:
1344
+ - type: map_at_1
1345
+ value: 21.254
1346
+ - type: map_at_10
1347
+ value: 33.723
1348
+ - type: map_at_100
1349
+ value: 35.574
1350
+ - type: map_at_1000
1351
+ value: 35.730000000000004
1352
+ - type: map_at_3
1353
+ value: 29.473
1354
+ - type: map_at_5
1355
+ value: 31.543
1356
+ - type: mrr_at_1
1357
+ value: 41.358
1358
+ - type: mrr_at_10
1359
+ value: 49.498
1360
+ - type: mrr_at_100
1361
+ value: 50.275999999999996
1362
+ - type: mrr_at_1000
1363
+ value: 50.308
1364
+ - type: mrr_at_3
1365
+ value: 47.016000000000005
1366
+ - type: mrr_at_5
1367
+ value: 48.336
1368
+ - type: ndcg_at_1
1369
+ value: 41.358
1370
+ - type: ndcg_at_10
1371
+ value: 41.579
1372
+ - type: ndcg_at_100
1373
+ value: 48.455
1374
+ - type: ndcg_at_1000
1375
+ value: 51.165000000000006
1376
+ - type: ndcg_at_3
1377
+ value: 37.681
1378
+ - type: ndcg_at_5
1379
+ value: 38.49
1380
+ - type: precision_at_1
1381
+ value: 41.358
1382
+ - type: precision_at_10
1383
+ value: 11.543000000000001
1384
+ - type: precision_at_100
1385
+ value: 1.87
1386
+ - type: precision_at_1000
1387
+ value: 0.23600000000000002
1388
+ - type: precision_at_3
1389
+ value: 24.743000000000002
1390
+ - type: precision_at_5
1391
+ value: 17.994
1392
+ - type: recall_at_1
1393
+ value: 21.254
1394
+ - type: recall_at_10
1395
+ value: 48.698
1396
+ - type: recall_at_100
1397
+ value: 74.588
1398
+ - type: recall_at_1000
1399
+ value: 91.00200000000001
1400
+ - type: recall_at_3
1401
+ value: 33.939
1402
+ - type: recall_at_5
1403
+ value: 39.367000000000004
1404
+ - task:
1405
+ type: Retrieval
1406
+ dataset:
1407
+ type: hotpotqa
1408
+ name: MTEB HotpotQA
1409
+ config: default
1410
+ split: test
1411
+ revision: None
1412
+ metrics:
1413
+ - type: map_at_1
1414
+ value: 35.922
1415
+ - type: map_at_10
1416
+ value: 52.32599999999999
1417
+ - type: map_at_100
1418
+ value: 53.18000000000001
1419
+ - type: map_at_1000
1420
+ value: 53.245
1421
+ - type: map_at_3
1422
+ value: 49.294
1423
+ - type: map_at_5
1424
+ value: 51.202999999999996
1425
+ - type: mrr_at_1
1426
+ value: 71.843
1427
+ - type: mrr_at_10
1428
+ value: 78.24600000000001
1429
+ - type: mrr_at_100
1430
+ value: 78.515
1431
+ - type: mrr_at_1000
1432
+ value: 78.527
1433
+ - type: mrr_at_3
1434
+ value: 77.17500000000001
1435
+ - type: mrr_at_5
1436
+ value: 77.852
1437
+ - type: ndcg_at_1
1438
+ value: 71.843
1439
+ - type: ndcg_at_10
1440
+ value: 61.379
1441
+ - type: ndcg_at_100
1442
+ value: 64.535
1443
+ - type: ndcg_at_1000
1444
+ value: 65.888
1445
+ - type: ndcg_at_3
1446
+ value: 56.958
1447
+ - type: ndcg_at_5
1448
+ value: 59.434
1449
+ - type: precision_at_1
1450
+ value: 71.843
1451
+ - type: precision_at_10
1452
+ value: 12.686
1453
+ - type: precision_at_100
1454
+ value: 1.517
1455
+ - type: precision_at_1000
1456
+ value: 0.16999999999999998
1457
+ - type: precision_at_3
1458
+ value: 35.778
1459
+ - type: precision_at_5
1460
+ value: 23.422
1461
+ - type: recall_at_1
1462
+ value: 35.922
1463
+ - type: recall_at_10
1464
+ value: 63.43
1465
+ - type: recall_at_100
1466
+ value: 75.868
1467
+ - type: recall_at_1000
1468
+ value: 84.88900000000001
1469
+ - type: recall_at_3
1470
+ value: 53.666000000000004
1471
+ - type: recall_at_5
1472
+ value: 58.555
1473
+ - task:
1474
+ type: Classification
1475
+ dataset:
1476
+ type: mteb/imdb
1477
+ name: MTEB ImdbClassification
1478
+ config: default
1479
+ split: test
1480
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
1481
+ metrics:
1482
+ - type: accuracy
1483
+ value: 79.4408
1484
+ - type: ap
1485
+ value: 73.52820871620366
1486
+ - type: f1
1487
+ value: 79.36240238685001
1488
+ - task:
1489
+ type: Retrieval
1490
+ dataset:
1491
+ type: msmarco
1492
+ name: MTEB MSMARCO
1493
+ config: default
1494
+ split: dev
1495
+ revision: None
1496
+ metrics:
1497
+ - type: map_at_1
1498
+ value: 21.826999999999998
1499
+ - type: map_at_10
1500
+ value: 34.04
1501
+ - type: map_at_100
1502
+ value: 35.226
1503
+ - type: map_at_1000
1504
+ value: 35.275
1505
+ - type: map_at_3
1506
+ value: 30.165999999999997
1507
+ - type: map_at_5
1508
+ value: 32.318000000000005
1509
+ - type: mrr_at_1
1510
+ value: 22.464000000000002
1511
+ - type: mrr_at_10
1512
+ value: 34.631
1513
+ - type: mrr_at_100
1514
+ value: 35.752
1515
+ - type: mrr_at_1000
1516
+ value: 35.795
1517
+ - type: mrr_at_3
1518
+ value: 30.798
1519
+ - type: mrr_at_5
1520
+ value: 32.946999999999996
1521
+ - type: ndcg_at_1
1522
+ value: 22.464000000000002
1523
+ - type: ndcg_at_10
1524
+ value: 40.919
1525
+ - type: ndcg_at_100
1526
+ value: 46.632
1527
+ - type: ndcg_at_1000
1528
+ value: 47.833
1529
+ - type: ndcg_at_3
1530
+ value: 32.992
1531
+ - type: ndcg_at_5
1532
+ value: 36.834
1533
+ - type: precision_at_1
1534
+ value: 22.464000000000002
1535
+ - type: precision_at_10
1536
+ value: 6.494
1537
+ - type: precision_at_100
1538
+ value: 0.9369999999999999
1539
+ - type: precision_at_1000
1540
+ value: 0.104
1541
+ - type: precision_at_3
1542
+ value: 14.021
1543
+ - type: precision_at_5
1544
+ value: 10.347000000000001
1545
+ - type: recall_at_1
1546
+ value: 21.826999999999998
1547
+ - type: recall_at_10
1548
+ value: 62.132
1549
+ - type: recall_at_100
1550
+ value: 88.55199999999999
1551
+ - type: recall_at_1000
1552
+ value: 97.707
1553
+ - type: recall_at_3
1554
+ value: 40.541
1555
+ - type: recall_at_5
1556
+ value: 49.739
1557
+ - task:
1558
+ type: Classification
1559
+ dataset:
1560
+ type: mteb/mtop_domain
1561
+ name: MTEB MTOPDomainClassification (en)
1562
+ config: en
1563
+ split: test
1564
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
1565
+ metrics:
1566
+ - type: accuracy
1567
+ value: 95.68399452804377
1568
+ - type: f1
1569
+ value: 95.25490609832268
1570
+ - task:
1571
+ type: Classification
1572
+ dataset:
1573
+ type: mteb/mtop_intent
1574
+ name: MTEB MTOPIntentClassification (en)
1575
+ config: en
1576
+ split: test
1577
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
1578
+ metrics:
1579
+ - type: accuracy
1580
+ value: 83.15321477428182
1581
+ - type: f1
1582
+ value: 60.35476439087966
1583
+ - task:
1584
+ type: Classification
1585
+ dataset:
1586
+ type: mteb/amazon_massive_intent
1587
+ name: MTEB MassiveIntentClassification (en)
1588
+ config: en
1589
+ split: test
1590
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
1591
+ metrics:
1592
+ - type: accuracy
1593
+ value: 71.92669804976462
1594
+ - type: f1
1595
+ value: 69.22815107207565
1596
+ - task:
1597
+ type: Classification
1598
+ dataset:
1599
+ type: mteb/amazon_massive_scenario
1600
+ name: MTEB MassiveScenarioClassification (en)
1601
+ config: en
1602
+ split: test
1603
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
1604
+ metrics:
1605
+ - type: accuracy
1606
+ value: 74.4855413584398
1607
+ - type: f1
1608
+ value: 72.92107516103387
1609
+ - task:
1610
+ type: Clustering
1611
+ dataset:
1612
+ type: mteb/medrxiv-clustering-p2p
1613
+ name: MTEB MedrxivClusteringP2P
1614
+ config: default
1615
+ split: test
1616
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
1617
+ metrics:
1618
+ - type: v_measure
1619
+ value: 32.412679360205544
1620
+ - task:
1621
+ type: Clustering
1622
+ dataset:
1623
+ type: mteb/medrxiv-clustering-s2s
1624
+ name: MTEB MedrxivClusteringS2S
1625
+ config: default
1626
+ split: test
1627
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
1628
+ metrics:
1629
+ - type: v_measure
1630
+ value: 28.09211869875204
1631
+ - task:
1632
+ type: Reranking
1633
+ dataset:
1634
+ type: mteb/mind_small
1635
+ name: MTEB MindSmallReranking
1636
+ config: default
1637
+ split: test
1638
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
1639
+ metrics:
1640
+ - type: map
1641
+ value: 30.540919056982545
1642
+ - type: mrr
1643
+ value: 31.529904607063536
1644
+ - task:
1645
+ type: Retrieval
1646
+ dataset:
1647
+ type: nfcorpus
1648
+ name: MTEB NFCorpus
1649
+ config: default
1650
+ split: test
1651
+ revision: None
1652
+ metrics:
1653
+ - type: map_at_1
1654
+ value: 5.745
1655
+ - type: map_at_10
1656
+ value: 12.013
1657
+ - type: map_at_100
1658
+ value: 15.040000000000001
1659
+ - type: map_at_1000
1660
+ value: 16.427
1661
+ - type: map_at_3
1662
+ value: 8.841000000000001
1663
+ - type: map_at_5
1664
+ value: 10.289
1665
+ - type: mrr_at_1
1666
+ value: 45.201
1667
+ - type: mrr_at_10
1668
+ value: 53.483999999999995
1669
+ - type: mrr_at_100
1670
+ value: 54.20700000000001
1671
+ - type: mrr_at_1000
1672
+ value: 54.252
1673
+ - type: mrr_at_3
1674
+ value: 51.29
1675
+ - type: mrr_at_5
1676
+ value: 52.73
1677
+ - type: ndcg_at_1
1678
+ value: 43.808
1679
+ - type: ndcg_at_10
1680
+ value: 32.445
1681
+ - type: ndcg_at_100
1682
+ value: 30.031000000000002
1683
+ - type: ndcg_at_1000
1684
+ value: 39.007
1685
+ - type: ndcg_at_3
1686
+ value: 37.204
1687
+ - type: ndcg_at_5
1688
+ value: 35.07
1689
+ - type: precision_at_1
1690
+ value: 45.201
1691
+ - type: precision_at_10
1692
+ value: 23.684
1693
+ - type: precision_at_100
1694
+ value: 7.600999999999999
1695
+ - type: precision_at_1000
1696
+ value: 2.043
1697
+ - type: precision_at_3
1698
+ value: 33.953
1699
+ - type: precision_at_5
1700
+ value: 29.412
1701
+ - type: recall_at_1
1702
+ value: 5.745
1703
+ - type: recall_at_10
1704
+ value: 16.168
1705
+ - type: recall_at_100
1706
+ value: 30.875999999999998
1707
+ - type: recall_at_1000
1708
+ value: 62.686
1709
+ - type: recall_at_3
1710
+ value: 9.75
1711
+ - type: recall_at_5
1712
+ value: 12.413
1713
+ - task:
1714
+ type: Retrieval
1715
+ dataset:
1716
+ type: nq
1717
+ name: MTEB NQ
1718
+ config: default
1719
+ split: test
1720
+ revision: None
1721
+ metrics:
1722
+ - type: map_at_1
1723
+ value: 37.828
1724
+ - type: map_at_10
1725
+ value: 53.239000000000004
1726
+ - type: map_at_100
1727
+ value: 54.035999999999994
1728
+ - type: map_at_1000
1729
+ value: 54.067
1730
+ - type: map_at_3
1731
+ value: 49.289
1732
+ - type: map_at_5
1733
+ value: 51.784
1734
+ - type: mrr_at_1
1735
+ value: 42.497
1736
+ - type: mrr_at_10
1737
+ value: 55.916999999999994
1738
+ - type: mrr_at_100
1739
+ value: 56.495
1740
+ - type: mrr_at_1000
1741
+ value: 56.516999999999996
1742
+ - type: mrr_at_3
1743
+ value: 52.800000000000004
1744
+ - type: mrr_at_5
1745
+ value: 54.722
1746
+ - type: ndcg_at_1
1747
+ value: 42.468
1748
+ - type: ndcg_at_10
1749
+ value: 60.437
1750
+ - type: ndcg_at_100
1751
+ value: 63.731
1752
+ - type: ndcg_at_1000
1753
+ value: 64.41799999999999
1754
+ - type: ndcg_at_3
1755
+ value: 53.230999999999995
1756
+ - type: ndcg_at_5
1757
+ value: 57.26
1758
+ - type: precision_at_1
1759
+ value: 42.468
1760
+ - type: precision_at_10
1761
+ value: 9.47
1762
+ - type: precision_at_100
1763
+ value: 1.1360000000000001
1764
+ - type: precision_at_1000
1765
+ value: 0.12
1766
+ - type: precision_at_3
1767
+ value: 23.724999999999998
1768
+ - type: precision_at_5
1769
+ value: 16.593
1770
+ - type: recall_at_1
1771
+ value: 37.828
1772
+ - type: recall_at_10
1773
+ value: 79.538
1774
+ - type: recall_at_100
1775
+ value: 93.646
1776
+ - type: recall_at_1000
1777
+ value: 98.72999999999999
1778
+ - type: recall_at_3
1779
+ value: 61.134
1780
+ - type: recall_at_5
1781
+ value: 70.377
1782
+ - task:
1783
+ type: Retrieval
1784
+ dataset:
1785
+ type: quora
1786
+ name: MTEB QuoraRetrieval
1787
+ config: default
1788
+ split: test
1789
+ revision: None
1790
+ metrics:
1791
+ - type: map_at_1
1792
+ value: 70.548
1793
+ - type: map_at_10
1794
+ value: 84.466
1795
+ - type: map_at_100
1796
+ value: 85.10600000000001
1797
+ - type: map_at_1000
1798
+ value: 85.123
1799
+ - type: map_at_3
1800
+ value: 81.57600000000001
1801
+ - type: map_at_5
1802
+ value: 83.399
1803
+ - type: mrr_at_1
1804
+ value: 81.24
1805
+ - type: mrr_at_10
1806
+ value: 87.457
1807
+ - type: mrr_at_100
1808
+ value: 87.574
1809
+ - type: mrr_at_1000
1810
+ value: 87.575
1811
+ - type: mrr_at_3
1812
+ value: 86.507
1813
+ - type: mrr_at_5
1814
+ value: 87.205
1815
+ - type: ndcg_at_1
1816
+ value: 81.25
1817
+ - type: ndcg_at_10
1818
+ value: 88.203
1819
+ - type: ndcg_at_100
1820
+ value: 89.457
1821
+ - type: ndcg_at_1000
1822
+ value: 89.563
1823
+ - type: ndcg_at_3
1824
+ value: 85.465
1825
+ - type: ndcg_at_5
1826
+ value: 87.007
1827
+ - type: precision_at_1
1828
+ value: 81.25
1829
+ - type: precision_at_10
1830
+ value: 13.373
1831
+ - type: precision_at_100
1832
+ value: 1.5270000000000001
1833
+ - type: precision_at_1000
1834
+ value: 0.157
1835
+ - type: precision_at_3
1836
+ value: 37.417
1837
+ - type: precision_at_5
1838
+ value: 24.556
1839
+ - type: recall_at_1
1840
+ value: 70.548
1841
+ - type: recall_at_10
1842
+ value: 95.208
1843
+ - type: recall_at_100
1844
+ value: 99.514
1845
+ - type: recall_at_1000
1846
+ value: 99.988
1847
+ - type: recall_at_3
1848
+ value: 87.214
1849
+ - type: recall_at_5
1850
+ value: 91.696
1851
+ - task:
1852
+ type: Clustering
1853
+ dataset:
1854
+ type: mteb/reddit-clustering
1855
+ name: MTEB RedditClustering
1856
+ config: default
1857
+ split: test
1858
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1859
+ metrics:
1860
+ - type: v_measure
1861
+ value: 53.04822095496839
1862
+ - task:
1863
+ type: Clustering
1864
+ dataset:
1865
+ type: mteb/reddit-clustering-p2p
1866
+ name: MTEB RedditClusteringP2P
1867
+ config: default
1868
+ split: test
1869
+ revision: 282350215ef01743dc01b456c7f5241fa8937f16
1870
+ metrics:
1871
+ - type: v_measure
1872
+ value: 60.30778476474675
1873
+ - task:
1874
+ type: Retrieval
1875
+ dataset:
1876
+ type: scidocs
1877
+ name: MTEB SCIDOCS
1878
+ config: default
1879
+ split: test
1880
+ revision: None
1881
+ metrics:
1882
+ - type: map_at_1
1883
+ value: 4.692
1884
+ - type: map_at_10
1885
+ value: 11.766
1886
+ - type: map_at_100
1887
+ value: 13.904
1888
+ - type: map_at_1000
1889
+ value: 14.216999999999999
1890
+ - type: map_at_3
1891
+ value: 8.245
1892
+ - type: map_at_5
1893
+ value: 9.92
1894
+ - type: mrr_at_1
1895
+ value: 23.0
1896
+ - type: mrr_at_10
1897
+ value: 33.78
1898
+ - type: mrr_at_100
1899
+ value: 34.922
1900
+ - type: mrr_at_1000
1901
+ value: 34.973
1902
+ - type: mrr_at_3
1903
+ value: 30.2
1904
+ - type: mrr_at_5
1905
+ value: 32.565
1906
+ - type: ndcg_at_1
1907
+ value: 23.0
1908
+ - type: ndcg_at_10
1909
+ value: 19.863
1910
+ - type: ndcg_at_100
1911
+ value: 28.141
1912
+ - type: ndcg_at_1000
1913
+ value: 33.549
1914
+ - type: ndcg_at_3
1915
+ value: 18.434
1916
+ - type: ndcg_at_5
1917
+ value: 16.384
1918
+ - type: precision_at_1
1919
+ value: 23.0
1920
+ - type: precision_at_10
1921
+ value: 10.39
1922
+ - type: precision_at_100
1923
+ value: 2.235
1924
+ - type: precision_at_1000
1925
+ value: 0.35300000000000004
1926
+ - type: precision_at_3
1927
+ value: 17.133000000000003
1928
+ - type: precision_at_5
1929
+ value: 14.44
1930
+ - type: recall_at_1
1931
+ value: 4.692
1932
+ - type: recall_at_10
1933
+ value: 21.025
1934
+ - type: recall_at_100
1935
+ value: 45.324999999999996
1936
+ - type: recall_at_1000
1937
+ value: 71.675
1938
+ - type: recall_at_3
1939
+ value: 10.440000000000001
1940
+ - type: recall_at_5
1941
+ value: 14.64
1942
+ - task:
1943
+ type: STS
1944
+ dataset:
1945
+ type: mteb/sickr-sts
1946
+ name: MTEB SICK-R
1947
+ config: default
1948
+ split: test
1949
+ revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
1950
+ metrics:
1951
+ - type: cos_sim_pearson
1952
+ value: 84.96178184892842
1953
+ - type: cos_sim_spearman
1954
+ value: 79.6487740813199
1955
+ - type: euclidean_pearson
1956
+ value: 82.06661161625023
1957
+ - type: euclidean_spearman
1958
+ value: 79.64876769031183
1959
+ - type: manhattan_pearson
1960
+ value: 82.07061164575131
1961
+ - type: manhattan_spearman
1962
+ value: 79.65197039464537
1963
+ - task:
1964
+ type: STS
1965
+ dataset:
1966
+ type: mteb/sts12-sts
1967
+ name: MTEB STS12
1968
+ config: default
1969
+ split: test
1970
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1971
+ metrics:
1972
+ - type: cos_sim_pearson
1973
+ value: 84.15305604100027
1974
+ - type: cos_sim_spearman
1975
+ value: 74.27447427941591
1976
+ - type: euclidean_pearson
1977
+ value: 80.52737337565307
1978
+ - type: euclidean_spearman
1979
+ value: 74.27416077132192
1980
+ - type: manhattan_pearson
1981
+ value: 80.53728571140387
1982
+ - type: manhattan_spearman
1983
+ value: 74.28853605753457
1984
+ - task:
1985
+ type: STS
1986
+ dataset:
1987
+ type: mteb/sts13-sts
1988
+ name: MTEB STS13
1989
+ config: default
1990
+ split: test
1991
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1992
+ metrics:
1993
+ - type: cos_sim_pearson
1994
+ value: 83.44386080639279
1995
+ - type: cos_sim_spearman
1996
+ value: 84.17947648159536
1997
+ - type: euclidean_pearson
1998
+ value: 83.34145388129387
1999
+ - type: euclidean_spearman
2000
+ value: 84.17947648159536
2001
+ - type: manhattan_pearson
2002
+ value: 83.30699061927966
2003
+ - type: manhattan_spearman
2004
+ value: 84.18125737380451
2005
+ - task:
2006
+ type: STS
2007
+ dataset:
2008
+ type: mteb/sts14-sts
2009
+ name: MTEB STS14
2010
+ config: default
2011
+ split: test
2012
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
2013
+ metrics:
2014
+ - type: cos_sim_pearson
2015
+ value: 81.57392220985612
2016
+ - type: cos_sim_spearman
2017
+ value: 78.80745014464101
2018
+ - type: euclidean_pearson
2019
+ value: 80.01660371487199
2020
+ - type: euclidean_spearman
2021
+ value: 78.80741240102256
2022
+ - type: manhattan_pearson
2023
+ value: 79.96810779507953
2024
+ - type: manhattan_spearman
2025
+ value: 78.75600400119448
2026
+ - task:
2027
+ type: STS
2028
+ dataset:
2029
+ type: mteb/sts15-sts
2030
+ name: MTEB STS15
2031
+ config: default
2032
+ split: test
2033
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
2034
+ metrics:
2035
+ - type: cos_sim_pearson
2036
+ value: 86.85421063026625
2037
+ - type: cos_sim_spearman
2038
+ value: 87.55320285299192
2039
+ - type: euclidean_pearson
2040
+ value: 86.69750143323517
2041
+ - type: euclidean_spearman
2042
+ value: 87.55320284326378
2043
+ - type: manhattan_pearson
2044
+ value: 86.63379169960379
2045
+ - type: manhattan_spearman
2046
+ value: 87.4815029877984
2047
+ - task:
2048
+ type: STS
2049
+ dataset:
2050
+ type: mteb/sts16-sts
2051
+ name: MTEB STS16
2052
+ config: default
2053
+ split: test
2054
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
2055
+ metrics:
2056
+ - type: cos_sim_pearson
2057
+ value: 84.31314130411842
2058
+ - type: cos_sim_spearman
2059
+ value: 85.3489588181433
2060
+ - type: euclidean_pearson
2061
+ value: 84.13240933463535
2062
+ - type: euclidean_spearman
2063
+ value: 85.34902871403281
2064
+ - type: manhattan_pearson
2065
+ value: 84.01183086503559
2066
+ - type: manhattan_spearman
2067
+ value: 85.19316703166102
2068
+ - task:
2069
+ type: STS
2070
+ dataset:
2071
+ type: mteb/sts17-crosslingual-sts
2072
+ name: MTEB STS17 (en-en)
2073
+ config: en-en
2074
+ split: test
2075
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
2076
+ metrics:
2077
+ - type: cos_sim_pearson
2078
+ value: 89.09979781689536
2079
+ - type: cos_sim_spearman
2080
+ value: 88.87813323759015
2081
+ - type: euclidean_pearson
2082
+ value: 88.65413031123792
2083
+ - type: euclidean_spearman
2084
+ value: 88.87813323759015
2085
+ - type: manhattan_pearson
2086
+ value: 88.61818758256024
2087
+ - type: manhattan_spearman
2088
+ value: 88.81044100494604
2089
+ - task:
2090
+ type: STS
2091
+ dataset:
2092
+ type: mteb/sts22-crosslingual-sts
2093
+ name: MTEB STS22 (en)
2094
+ config: en
2095
+ split: test
2096
+ revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
2097
+ metrics:
2098
+ - type: cos_sim_pearson
2099
+ value: 62.30693258111531
2100
+ - type: cos_sim_spearman
2101
+ value: 62.195516523251946
2102
+ - type: euclidean_pearson
2103
+ value: 62.951283701049476
2104
+ - type: euclidean_spearman
2105
+ value: 62.195516523251946
2106
+ - type: manhattan_pearson
2107
+ value: 63.068322281439535
2108
+ - type: manhattan_spearman
2109
+ value: 62.10621171028406
2110
+ - task:
2111
+ type: STS
2112
+ dataset:
2113
+ type: mteb/stsbenchmark-sts
2114
+ name: MTEB STSBenchmark
2115
+ config: default
2116
+ split: test
2117
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
2118
+ metrics:
2119
+ - type: cos_sim_pearson
2120
+ value: 84.27092833763909
2121
+ - type: cos_sim_spearman
2122
+ value: 84.84429717949759
2123
+ - type: euclidean_pearson
2124
+ value: 84.8516966060792
2125
+ - type: euclidean_spearman
2126
+ value: 84.84429717949759
2127
+ - type: manhattan_pearson
2128
+ value: 84.82203139242881
2129
+ - type: manhattan_spearman
2130
+ value: 84.8358503952945
2131
+ - task:
2132
+ type: Reranking
2133
+ dataset:
2134
+ type: mteb/scidocs-reranking
2135
+ name: MTEB SciDocsRR
2136
+ config: default
2137
+ split: test
2138
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
2139
+ metrics:
2140
+ - type: map
2141
+ value: 83.10290863981409
2142
+ - type: mrr
2143
+ value: 95.31168450286097
2144
+ - task:
2145
+ type: Retrieval
2146
+ dataset:
2147
+ type: scifact
2148
+ name: MTEB SciFact
2149
+ config: default
2150
+ split: test
2151
+ revision: None
2152
+ metrics:
2153
+ - type: map_at_1
2154
+ value: 52.161
2155
+ - type: map_at_10
2156
+ value: 62.138000000000005
2157
+ - type: map_at_100
2158
+ value: 62.769
2159
+ - type: map_at_1000
2160
+ value: 62.812
2161
+ - type: map_at_3
2162
+ value: 59.111000000000004
2163
+ - type: map_at_5
2164
+ value: 60.995999999999995
2165
+ - type: mrr_at_1
2166
+ value: 55.333
2167
+ - type: mrr_at_10
2168
+ value: 63.504000000000005
2169
+ - type: mrr_at_100
2170
+ value: 64.036
2171
+ - type: mrr_at_1000
2172
+ value: 64.08
2173
+ - type: mrr_at_3
2174
+ value: 61.278
2175
+ - type: mrr_at_5
2176
+ value: 62.778
2177
+ - type: ndcg_at_1
2178
+ value: 55.333
2179
+ - type: ndcg_at_10
2180
+ value: 66.678
2181
+ - type: ndcg_at_100
2182
+ value: 69.415
2183
+ - type: ndcg_at_1000
2184
+ value: 70.453
2185
+ - type: ndcg_at_3
2186
+ value: 61.755
2187
+ - type: ndcg_at_5
2188
+ value: 64.546
2189
+ - type: precision_at_1
2190
+ value: 55.333
2191
+ - type: precision_at_10
2192
+ value: 9.033
2193
+ - type: precision_at_100
2194
+ value: 1.043
2195
+ - type: precision_at_1000
2196
+ value: 0.11199999999999999
2197
+ - type: precision_at_3
2198
+ value: 24.221999999999998
2199
+ - type: precision_at_5
2200
+ value: 16.333000000000002
2201
+ - type: recall_at_1
2202
+ value: 52.161
2203
+ - type: recall_at_10
2204
+ value: 79.156
2205
+ - type: recall_at_100
2206
+ value: 91.333
2207
+ - type: recall_at_1000
2208
+ value: 99.333
2209
+ - type: recall_at_3
2210
+ value: 66.43299999999999
2211
+ - type: recall_at_5
2212
+ value: 73.272
2213
+ - task:
2214
+ type: PairClassification
2215
+ dataset:
2216
+ type: mteb/sprintduplicatequestions-pairclassification
2217
+ name: MTEB SprintDuplicateQuestions
2218
+ config: default
2219
+ split: test
2220
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
2221
+ metrics:
2222
+ - type: cos_sim_accuracy
2223
+ value: 99.81287128712871
2224
+ - type: cos_sim_ap
2225
+ value: 95.30034785910676
2226
+ - type: cos_sim_f1
2227
+ value: 90.28629856850716
2228
+ - type: cos_sim_precision
2229
+ value: 92.36401673640168
2230
+ - type: cos_sim_recall
2231
+ value: 88.3
2232
+ - type: dot_accuracy
2233
+ value: 99.81287128712871
2234
+ - type: dot_ap
2235
+ value: 95.30034785910676
2236
+ - type: dot_f1
2237
+ value: 90.28629856850716
2238
+ - type: dot_precision
2239
+ value: 92.36401673640168
2240
+ - type: dot_recall
2241
+ value: 88.3
2242
+ - type: euclidean_accuracy
2243
+ value: 99.81287128712871
2244
+ - type: euclidean_ap
2245
+ value: 95.30034785910676
2246
+ - type: euclidean_f1
2247
+ value: 90.28629856850716
2248
+ - type: euclidean_precision
2249
+ value: 92.36401673640168
2250
+ - type: euclidean_recall
2251
+ value: 88.3
2252
+ - type: manhattan_accuracy
2253
+ value: 99.80990099009901
2254
+ - type: manhattan_ap
2255
+ value: 95.26880751950654
2256
+ - type: manhattan_f1
2257
+ value: 90.22177419354838
2258
+ - type: manhattan_precision
2259
+ value: 90.95528455284553
2260
+ - type: manhattan_recall
2261
+ value: 89.5
2262
+ - type: max_accuracy
2263
+ value: 99.81287128712871
2264
+ - type: max_ap
2265
+ value: 95.30034785910676
2266
+ - type: max_f1
2267
+ value: 90.28629856850716
2268
+ - task:
2269
+ type: Clustering
2270
+ dataset:
2271
+ type: mteb/stackexchange-clustering
2272
+ name: MTEB StackExchangeClustering
2273
+ config: default
2274
+ split: test
2275
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
2276
+ metrics:
2277
+ - type: v_measure
2278
+ value: 58.518662504351184
2279
+ - task:
2280
+ type: Clustering
2281
+ dataset:
2282
+ type: mteb/stackexchange-clustering-p2p
2283
+ name: MTEB StackExchangeClusteringP2P
2284
+ config: default
2285
+ split: test
2286
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
2287
+ metrics:
2288
+ - type: v_measure
2289
+ value: 34.96168178378587
2290
+ - task:
2291
+ type: Reranking
2292
+ dataset:
2293
+ type: mteb/stackoverflowdupquestions-reranking
2294
+ name: MTEB StackOverflowDupQuestions
2295
+ config: default
2296
+ split: test
2297
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
2298
+ metrics:
2299
+ - type: map
2300
+ value: 52.04862593471896
2301
+ - type: mrr
2302
+ value: 52.97238402936932
2303
+ - task:
2304
+ type: Summarization
2305
+ dataset:
2306
+ type: mteb/summeval
2307
+ name: MTEB SummEval
2308
+ config: default
2309
+ split: test
2310
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
2311
+ metrics:
2312
+ - type: cos_sim_pearson
2313
+ value: 30.092545236479946
2314
+ - type: cos_sim_spearman
2315
+ value: 31.599851000175498
2316
+ - type: dot_pearson
2317
+ value: 30.092542723901676
2318
+ - type: dot_spearman
2319
+ value: 31.599851000175498
2320
+ - task:
2321
+ type: Retrieval
2322
+ dataset:
2323
+ type: trec-covid
2324
+ name: MTEB TRECCOVID
2325
+ config: default
2326
+ split: test
2327
+ revision: None
2328
+ metrics:
2329
+ - type: map_at_1
2330
+ value: 0.189
2331
+ - type: map_at_10
2332
+ value: 1.662
2333
+ - type: map_at_100
2334
+ value: 9.384
2335
+ - type: map_at_1000
2336
+ value: 22.669
2337
+ - type: map_at_3
2338
+ value: 0.5559999999999999
2339
+ - type: map_at_5
2340
+ value: 0.9039999999999999
2341
+ - type: mrr_at_1
2342
+ value: 68.0
2343
+ - type: mrr_at_10
2344
+ value: 81.01899999999999
2345
+ - type: mrr_at_100
2346
+ value: 81.01899999999999
2347
+ - type: mrr_at_1000
2348
+ value: 81.01899999999999
2349
+ - type: mrr_at_3
2350
+ value: 79.333
2351
+ - type: mrr_at_5
2352
+ value: 80.733
2353
+ - type: ndcg_at_1
2354
+ value: 63.0
2355
+ - type: ndcg_at_10
2356
+ value: 65.913
2357
+ - type: ndcg_at_100
2358
+ value: 51.895
2359
+ - type: ndcg_at_1000
2360
+ value: 46.967
2361
+ - type: ndcg_at_3
2362
+ value: 65.49199999999999
2363
+ - type: ndcg_at_5
2364
+ value: 66.69699999999999
2365
+ - type: precision_at_1
2366
+ value: 68.0
2367
+ - type: precision_at_10
2368
+ value: 71.6
2369
+ - type: precision_at_100
2370
+ value: 53.66
2371
+ - type: precision_at_1000
2372
+ value: 21.124000000000002
2373
+ - type: precision_at_3
2374
+ value: 72.667
2375
+ - type: precision_at_5
2376
+ value: 74.0
2377
+ - type: recall_at_1
2378
+ value: 0.189
2379
+ - type: recall_at_10
2380
+ value: 1.913
2381
+ - type: recall_at_100
2382
+ value: 12.601999999999999
2383
+ - type: recall_at_1000
2384
+ value: 44.296
2385
+ - type: recall_at_3
2386
+ value: 0.605
2387
+ - type: recall_at_5
2388
+ value: 1.018
2389
+ - task:
2390
+ type: Retrieval
2391
+ dataset:
2392
+ type: webis-touche2020
2393
+ name: MTEB Touche2020
2394
+ config: default
2395
+ split: test
2396
+ revision: None
2397
+ metrics:
2398
+ - type: map_at_1
2399
+ value: 2.701
2400
+ - type: map_at_10
2401
+ value: 10.445
2402
+ - type: map_at_100
2403
+ value: 17.324
2404
+ - type: map_at_1000
2405
+ value: 19.161
2406
+ - type: map_at_3
2407
+ value: 5.497
2408
+ - type: map_at_5
2409
+ value: 7.278
2410
+ - type: mrr_at_1
2411
+ value: 30.612000000000002
2412
+ - type: mrr_at_10
2413
+ value: 45.534
2414
+ - type: mrr_at_100
2415
+ value: 45.792
2416
+ - type: mrr_at_1000
2417
+ value: 45.806999999999995
2418
+ - type: mrr_at_3
2419
+ value: 37.755
2420
+ - type: mrr_at_5
2421
+ value: 43.469
2422
+ - type: ndcg_at_1
2423
+ value: 26.531
2424
+ - type: ndcg_at_10
2425
+ value: 26.235000000000003
2426
+ - type: ndcg_at_100
2427
+ value: 39.17
2428
+ - type: ndcg_at_1000
2429
+ value: 51.038
2430
+ - type: ndcg_at_3
2431
+ value: 23.625
2432
+ - type: ndcg_at_5
2433
+ value: 24.338
2434
+ - type: precision_at_1
2435
+ value: 30.612000000000002
2436
+ - type: precision_at_10
2437
+ value: 24.285999999999998
2438
+ - type: precision_at_100
2439
+ value: 8.224
2440
+ - type: precision_at_1000
2441
+ value: 1.6179999999999999
2442
+ - type: precision_at_3
2443
+ value: 24.490000000000002
2444
+ - type: precision_at_5
2445
+ value: 24.898
2446
+ - type: recall_at_1
2447
+ value: 2.701
2448
+ - type: recall_at_10
2449
+ value: 17.997
2450
+ - type: recall_at_100
2451
+ value: 51.766999999999996
2452
+ - type: recall_at_1000
2453
+ value: 87.863
2454
+ - type: recall_at_3
2455
+ value: 6.295000000000001
2456
+ - type: recall_at_5
2457
+ value: 9.993
2458
+ - task:
2459
+ type: Classification
2460
+ dataset:
2461
+ type: mteb/toxic_conversations_50k
2462
+ name: MTEB ToxicConversationsClassification
2463
+ config: default
2464
+ split: test
2465
+ revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
2466
+ metrics:
2467
+ - type: accuracy
2468
+ value: 73.3474
2469
+ - type: ap
2470
+ value: 15.393431414459924
2471
+ - type: f1
2472
+ value: 56.466681887882416
2473
+ - task:
2474
+ type: Classification
2475
+ dataset:
2476
+ type: mteb/tweet_sentiment_extraction
2477
+ name: MTEB TweetSentimentExtractionClassification
2478
+ config: default
2479
+ split: test
2480
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
2481
+ metrics:
2482
+ - type: accuracy
2483
+ value: 62.062818336163
2484
+ - type: f1
2485
+ value: 62.11230840463252
2486
+ - task:
2487
+ type: Clustering
2488
+ dataset:
2489
+ type: mteb/twentynewsgroups-clustering
2490
+ name: MTEB TwentyNewsgroupsClustering
2491
+ config: default
2492
+ split: test
2493
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
2494
+ metrics:
2495
+ - type: v_measure
2496
+ value: 42.464892820845115
2497
+ - task:
2498
+ type: PairClassification
2499
+ dataset:
2500
+ type: mteb/twittersemeval2015-pairclassification
2501
+ name: MTEB TwitterSemEval2015
2502
+ config: default
2503
+ split: test
2504
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
2505
+ metrics:
2506
+ - type: cos_sim_accuracy
2507
+ value: 86.15962329379508
2508
+ - type: cos_sim_ap
2509
+ value: 74.73674057919256
2510
+ - type: cos_sim_f1
2511
+ value: 68.81245642574947
2512
+ - type: cos_sim_precision
2513
+ value: 61.48255813953488
2514
+ - type: cos_sim_recall
2515
+ value: 78.12664907651715
2516
+ - type: dot_accuracy
2517
+ value: 86.15962329379508
2518
+ - type: dot_ap
2519
+ value: 74.7367634988281
2520
+ - type: dot_f1
2521
+ value: 68.81245642574947
2522
+ - type: dot_precision
2523
+ value: 61.48255813953488
2524
+ - type: dot_recall
2525
+ value: 78.12664907651715
2526
+ - type: euclidean_accuracy
2527
+ value: 86.15962329379508
2528
+ - type: euclidean_ap
2529
+ value: 74.7367761466634
2530
+ - type: euclidean_f1
2531
+ value: 68.81245642574947
2532
+ - type: euclidean_precision
2533
+ value: 61.48255813953488
2534
+ - type: euclidean_recall
2535
+ value: 78.12664907651715
2536
+ - type: manhattan_accuracy
2537
+ value: 86.21326816474935
2538
+ - type: manhattan_ap
2539
+ value: 74.64416473733951
2540
+ - type: manhattan_f1
2541
+ value: 68.80924855491331
2542
+ - type: manhattan_precision
2543
+ value: 61.23456790123457
2544
+ - type: manhattan_recall
2545
+ value: 78.52242744063325
2546
+ - type: max_accuracy
2547
+ value: 86.21326816474935
2548
+ - type: max_ap
2549
+ value: 74.7367761466634
2550
+ - type: max_f1
2551
+ value: 68.81245642574947
2552
+ - task:
2553
+ type: PairClassification
2554
+ dataset:
2555
+ type: mteb/twitterurlcorpus-pairclassification
2556
+ name: MTEB TwitterURLCorpus
2557
+ config: default
2558
+ split: test
2559
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
2560
+ metrics:
2561
+ - type: cos_sim_accuracy
2562
+ value: 88.97620988085536
2563
+ - type: cos_sim_ap
2564
+ value: 86.08680845745758
2565
+ - type: cos_sim_f1
2566
+ value: 78.02793637114438
2567
+ - type: cos_sim_precision
2568
+ value: 73.11082699683736
2569
+ - type: cos_sim_recall
2570
+ value: 83.65414228518632
2571
+ - type: dot_accuracy
2572
+ value: 88.97620988085536
2573
+ - type: dot_ap
2574
+ value: 86.08681149437946
2575
+ - type: dot_f1
2576
+ value: 78.02793637114438
2577
+ - type: dot_precision
2578
+ value: 73.11082699683736
2579
+ - type: dot_recall
2580
+ value: 83.65414228518632
2581
+ - type: euclidean_accuracy
2582
+ value: 88.97620988085536
2583
+ - type: euclidean_ap
2584
+ value: 86.08681215460771
2585
+ - type: euclidean_f1
2586
+ value: 78.02793637114438
2587
+ - type: euclidean_precision
2588
+ value: 73.11082699683736
2589
+ - type: euclidean_recall
2590
+ value: 83.65414228518632
2591
+ - type: manhattan_accuracy
2592
+ value: 88.88888888888889
2593
+ - type: manhattan_ap
2594
+ value: 86.02916327562438
2595
+ - type: manhattan_f1
2596
+ value: 78.02063045516843
2597
+ - type: manhattan_precision
2598
+ value: 73.38851947346994
2599
+ - type: manhattan_recall
2600
+ value: 83.2768709578072
2601
+ - type: max_accuracy
2602
+ value: 88.97620988085536
2603
+ - type: max_ap
2604
+ value: 86.08681215460771
2605
+ - type: max_f1
2606
+ value: 78.02793637114438
2607
+ ---
2608
+ <!-- TODO: add evaluation results here -->
2609
+ <br><br>
2610
+
2611
+ <p align="center">
2612
+ <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
2613
+ </p>
2614
+
2615
+
2616
+ <p align="center">
2617
+ <b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
2618
+ </p>
2619
+
2620
+ ## Quick Start
2621
+
2622
+ The easiest way to starting using `jina-embeddings-v2-base-en` is to use Jina AI's [Embedding API](https://jina.ai/embeddings/).
2623
+
2624
+ ## Intended Usage & Model Info
2625
+
2626
+ `jina-embeddings-v2-base-en` is an English, monolingual **embedding model** supporting **8192 sequence length**.
2627
+ It is based on a BERT architecture (JinaBERT) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
2628
+ The backbone `jina-bert-v2-base-en` is pretrained on the C4 dataset.
2629
+ The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
2630
+ These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
2631
+
2632
+ The embedding model was trained using 512 sequence length, but extrapolates to 8k sequence length (or even longer) thanks to ALiBi.
2633
+ This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search, etc.
2634
+
2635
+ With a standard size of 137 million parameters, the model enables fast inference while delivering better performance than our small model. It is recommended to use a single GPU for inference.
2636
+ Additionally, we provide the following embedding models:
2637
+
2638
+ - [`jina-embeddings-v2-small-en`](https://huggingface.co/jinaai/jina-embeddings-v2-small-en): 33 million parameters.
2639
+ - [`jina-embeddings-v2-base-en`](https://huggingface.co/jinaai/jina-embeddings-v2-base-en): 137 million parameters **(you are here)**.
2640
+ - [`jina-embeddings-v2-base-zh`](https://huggingface.co/jinaai/jina-embeddings-v2-base-zh): Chinese-English Bilingual embeddings.
2641
+ - [`jina-embeddings-v2-base-de`](https://huggingface.co/jinaai/jina-embeddings-v2-base-de): German-English Bilingual embeddings.
2642
+ - [`jina-embeddings-v2-base-es`](https://huggingface.co/jinaai/jina-embeddings-v2-base-es): Spanish-English Bilingual embeddings.
2643
+
2644
+ ## Data & Parameters
2645
+
2646
+ Jina Embeddings V2 [technical report](https://arxiv.org/abs/2310.19923)
2647
+
2648
+ ## Usage
2649
+
2650
+ **<details><summary>Please apply mean pooling when integrating the model.</summary>**
2651
+ <p>
2652
+
2653
+ ### Why mean pooling?
2654
+
2655
+ `mean poooling` takes all token embeddings from model output and averaging them at sentence/paragraph level.
2656
+ It has been proved to be the most effective way to produce high-quality sentence embeddings.
2657
+ We offer an `encode` function to deal with this.
2658
+
2659
+ However, if you would like to do it without using the default `encode` function:
2660
+
2661
+ ```python
2662
+ import torch
2663
+ import torch.nn.functional as F
2664
+ from transformers import AutoTokenizer, AutoModel
2665
+
2666
+ def mean_pooling(model_output, attention_mask):
2667
+ token_embeddings = model_output[0]
2668
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
2669
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
2670
+
2671
+ sentences = ['How is the weather today?', 'What is the current weather like today?']
2672
+
2673
+ tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v2-small-en')
2674
+ model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-small-en', trust_remote_code=True)
2675
+
2676
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
2677
+
2678
+ with torch.no_grad():
2679
+ model_output = model(**encoded_input)
2680
+
2681
+ embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
2682
+ embeddings = F.normalize(embeddings, p=2, dim=1)
2683
+ ```
2684
+
2685
+ </p>
2686
+ </details>
2687
+
2688
+ You can use Jina Embedding models directly from transformers package.
2689
+
2690
+ First, you need to make sure that you are logged into huggingface. You can either use the huggingface-cli tool (after installing the `transformers` package) and pass your [hugginface access token](https://huggingface.co/docs/hub/security-tokens):
2691
+ ```bash
2692
+ huggingface-cli login
2693
+ ```
2694
+ Alternatively, you can provide the access token as an environment variable in the shell:
2695
+ ```bash
2696
+ export HF_TOKEN="<your token here>"
2697
+ ```
2698
+ or in Python:
2699
+ ```python
2700
+ import os
2701
+
2702
+ os.environ['HF_TOKEN'] = "<your token here>"
2703
+ ```
2704
+
2705
+ Then, you can use load and use the model via the `AutoModel` class:
2706
+
2707
+ ```python
2708
+ !pip install transformers
2709
+ from transformers import AutoModel
2710
+ from numpy.linalg import norm
2711
+
2712
+ cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
2713
+ model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True) # trust_remote_code is needed to use the encode method
2714
+ embeddings = model.encode(['How is the weather today?', 'What is the current weather like today?'])
2715
+ print(cos_sim(embeddings[0], embeddings[1]))
2716
+ ```
2717
+
2718
+ If you only want to handle shorter sequence, such as 2k, pass the `max_length` parameter to the `encode` function:
2719
+
2720
+ ```python
2721
+ embeddings = model.encode(
2722
+ ['Very long ... document'],
2723
+ max_length=2048
2724
+ )
2725
+ ```
2726
+
2727
+ Using the its latest release (v2.3.0) sentence-transformers also supports Jina embeddings (Please make sure that you are logged into huggingface as well):
2728
+
2729
+ ```python
2730
+ !pip install -U sentence-transformers
2731
+ from sentence_transformers import SentenceTransformer
2732
+ from sentence_transformers.util import cos_sim
2733
+
2734
+ model = SentenceTransformer(
2735
+ "jinaai/jina-embeddings-v2-base-en", # switch to en/zh for English or Chinese
2736
+ trust_remote_code=True
2737
+ )
2738
+
2739
+ # control your input sequence length up to 8192
2740
+ model.max_seq_length = 1024
2741
+
2742
+ embeddings = model.encode([
2743
+ 'How is the weather today?',
2744
+ 'What is the current weather like today?'
2745
+ ])
2746
+ print(cos_sim(embeddings[0], embeddings[1]))
2747
+ ```
2748
+
2749
+ ## Alternatives to Using Transformers (or SentencTransformers) Package
2750
+
2751
+ 1. _Managed SaaS_: Get started with a free key on Jina AI's [Embedding API](https://jina.ai/embeddings/).
2752
+ 2. _Private and high-performance deployment_: Get started by picking from our suite of models and deploy them on [AWS Sagemaker](https://aws.amazon.com/marketplace/seller-profile?id=seller-stch2ludm6vgy).
2753
+
2754
+
2755
+ ## Use Jina Embeddings for RAG
2756
+
2757
+ According to the latest blog post from [LLamaIndex](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83),
2758
+
2759
+ > In summary, to achieve the peak performance in both hit rate and MRR, the combination of OpenAI or JinaAI-Base embeddings with the CohereRerank/bge-reranker-large reranker stands out.
2760
+
2761
+ <img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*ZP2RVejCZovF3FDCg-Bx3A.png" width="780px">
2762
+
2763
+
2764
+ ## Plans
2765
+
2766
+ 1. Bilingual embedding models supporting more European & Asian languages, including Spanish, French, Italian and Japanese.
2767
+ 2. Multimodal embedding models enable Multimodal RAG applications.
2768
+ 3. High-performt rerankers.
2769
+
2770
+ ## Trouble Shooting
2771
+
2772
+ **Loading of Model Code failed**
2773
+
2774
+ If you forgot to pass the `trust_remote_code=True` flag when calling `AutoModel.from_pretrained` or initializing the model via the `SentenceTransformer` class, you will receive an error that the model weights could not be initialized.
2775
+ This is caused by tranformers falling back to creating a default BERT model, instead of a jina-embedding model:
2776
+
2777
+ ```bash
2778
+ Some weights of the model checkpoint at jinaai/jina-embeddings-v2-base-en were not used when initializing BertModel: ['encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.8.mlp.layernorm.bias', ...
2779
+ ```
2780
+
2781
+
2782
+ **User is not logged into Huggingface**
2783
+
2784
+ The model is only availabe under [gated access](https://huggingface.co/docs/hub/models-gated).
2785
+ This means you need to be logged into huggingface load load it.
2786
+ If you receive the following error, you need to provide an access token, either by using the huggingface-cli or providing the token via an environment variable as described above:
2787
+ ```bash
2788
+ OSError: jinaai/jina-embeddings-v2-base-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
2789
+ If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
2790
+ ```
2791
+
2792
+ ## Contact
2793
+
2794
+ Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.
2795
+
2796
+ ## Citation
2797
+
2798
+ If you find Jina Embeddings useful in your research, please cite the following paper:
2799
+
2800
+ ```
2801
+ @misc{günther2023jina,
2802
+ title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents},
2803
+ author={Michael Günther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
2804
+ year={2023},
2805
+ eprint={2310.19923},
2806
+ archivePrefix={arXiv},
2807
+ primaryClass={cs.CL}
2808
+ }
2809
+ ```
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "jinaai/jina-bert-implementation",
3
+ "model_max_length": 8192,
4
+ "architectures": [
5
+ "JinaBertForMaskedLM"
6
+ ],
7
+ "attention_probs_dropout_prob": 0.0,
8
+ "auto_map": {
9
+ "AutoConfig": "jinaai/jina-bert-implementation--configuration_bert.JinaBertConfig",
10
+ "AutoModelForMaskedLM": "jinaai/jina-bert-implementation--modeling_bert.JinaBertForMaskedLM",
11
+ "AutoModel": "jinaai/jina-bert-implementation--modeling_bert.JinaBertModel",
12
+ "AutoModelForSequenceClassification": "jinaai/jina-bert-implementation--modeling_bert.JinaBertForSequenceClassification"
13
+ },
14
+ "classifier_dropout": null,
15
+ "gradient_checkpointing": false,
16
+ "hidden_act": "gelu",
17
+ "hidden_dropout_prob": 0.1,
18
+ "hidden_size": 768,
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 3072,
21
+ "layer_norm_eps": 1e-12,
22
+ "max_position_embeddings": 8192,
23
+ "model_type": "bert",
24
+ "num_attention_heads": 12,
25
+ "num_hidden_layers": 12,
26
+ "pad_token_id": 0,
27
+ "position_embedding_type": "alibi",
28
+ "torch_dtype": "float32",
29
+ "transformers_version": "4.26.0",
30
+ "type_vocab_size": 2,
31
+ "use_cache": true,
32
+ "vocab_size": 30528,
33
+ "feed_forward_type": "geglu",
34
+ "emb_pooler": "mean"
35
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.2.2",
4
+ "transformers": "4.31.0",
5
+ "pytorch": "2.0.1"
6
+ }
7
+ }
coreml/float32_model.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:106c1ee920a9ea9d8d30523206bf862f2469a7d3e5e9c90c109bfde3df898060
3
+ size 135698
coreml/float32_model.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8c81b424167aab6b88182cc404fb7d7517084597121ca63c10a9423e26186e2
3
+ size 550683456
coreml/float32_model.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "4BDEEC23-5067-410C-8A8F-A649FD4360D9": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Specification",
7
+ "name": "model.mlmodel",
8
+ "path": "com.apple.CoreML/model.mlmodel"
9
+ },
10
+ "B7AAB529-A51F-4EB3-B2CD-4BDA80250E6F": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Weights",
13
+ "name": "weights",
14
+ "path": "com.apple.CoreML/weights"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "4BDEEC23-5067-410C-8A8F-A649FD4360D9"
18
+ }
generation_config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "pad_token_id": 0,
4
+ "transformers_version": "4.26.0"
5
+ }
model-w-mean-pooling.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f66a1ed716aba330616d750db872e6b535eb585bd6aed390207037b7e4085410
3
+ size 547365958
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6bccce798906f269ee6990d35b8a516390a9593cde824de2e6b9d087b07fa4d
3
+ size 547390322
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b70f1386f05b9703ea4edf7f1550a8925399f9580e4cc754cc099efc1e736d8
3
+ size 274757256
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cd5a65131aa1db04c4146f474bdf68fac06417cba56789f4e6aaabd190e2818
3
+ size 274773117
sentence_bert_config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false,
4
+ "model_args": {"trust_remote_code": true}
5
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "clean_up_tokenization_spaces": true,
3
+ "cls_token": "[CLS]",
4
+ "do_basic_tokenize": true,
5
+ "do_lower_case": true,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 2147483648,
8
+ "never_split": null,
9
+ "pad_token": "[PAD]",
10
+ "sep_token": "[SEP]",
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff