djovak commited on
Commit
c298ab6
1 Parent(s): e84bd54

update readme.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -45,9 +45,16 @@ print(embeddings)
45
  ```
46
 
47
  ### Important usage notes
48
- - "ošišana ćirilica" (usage of c instead of ć, etc...) significantly deacreases search quality
49
  - The usage of uppercase letters for named entities can significantly improve search quality
50
 
 
 
 
 
 
 
 
51
 
52
  ## Evaluation
53
 
@@ -86,7 +93,6 @@ Evaluation datasets will be published as Part of [MTEB benchmark](https://huggin
86
 
87
  If you have any question or sugestion related to this project, you can open an issue or pull request. You can also email me at [email protected]
88
 
89
-
90
  ## Full Model Architecture
91
  ```
92
  SentenceTransformer(
 
45
  ```
46
 
47
  ### Important usage notes
48
+ - "ošišana latinica" (usage of c instead of ć, etc...) significantly deacreases search quality
49
  - The usage of uppercase letters for named entities can significantly improve search quality
50
 
51
+ ## Training
52
+
53
+ - Embedić models are fine-tuned from multilingual-e5 models and they come in 3 sizes (small, base, large).
54
+
55
+ - Training is done on a single 4070ti super GPU
56
+
57
+ - 3-step training: distillation, training on (query, text) pairs and finally fine-tuning with triplets.
58
 
59
  ## Evaluation
60
 
 
93
 
94
  If you have any question or sugestion related to this project, you can open an issue or pull request. You can also email me at [email protected]
95
 
 
96
  ## Full Model Architecture
97
  ```
98
  SentenceTransformer(