SeanLee97 commited on
Commit
d324ec0
1 Parent(s): 61c3508

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -32
README.md CHANGED
@@ -8,51 +8,80 @@ model-index:
8
  results: []
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
 
14
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>]()
15
- # pre-pubmedbert-base-embedding
16
 
17
- This model is a fine-tuned version of [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) on an unknown dataset.
18
 
19
- ## Model description
20
 
21
- More information needed
 
 
22
 
23
- ## Intended uses & limitations
24
 
25
- More information needed
 
 
 
 
 
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
30
 
31
- ## Training procedure
32
 
33
- ### Training hyperparameters
 
 
34
 
35
- The following hyperparameters were used during training:
36
- - learning_rate: 1e-06
37
- - train_batch_size: 64
38
- - eval_batch_size: 8
39
- - seed: 42
40
- - distributed_type: multi-GPU
41
- - num_devices: 3
42
- - total_train_batch_size: 192
43
- - total_eval_batch_size: 24
44
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
- - lr_scheduler_type: linear
46
- - lr_scheduler_warmup_steps: 50
47
- - num_epochs: 1
48
 
49
- ### Training results
 
 
50
 
 
51
 
 
 
 
 
 
 
52
 
53
- ### Framework versions
 
54
 
55
- - Transformers 4.42.3
56
- - Pytorch 2.3.0+cu121
57
- - Datasets 2.19.1
58
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  results: []
9
  ---
10
 
11
+ # WhereIsAI/pubmed-angle-base-en
 
12
 
13
+ This model is an example model for the Chinese blog post [title](#) and [angle tutorial](https://angle.readthedocs.io/en/latest/notes/tutorial.html#tutorial).
 
14
 
15
+ It was fine-tuned with [AnglE Loss](https://arxiv.org/abs/2309.12871) using the official [angle-emb](https://github.com/SeanLee97/AnglE).
16
 
17
+ Here are the details:
18
 
19
+ - Base model: [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext)
20
+ - Training Data: [WhereIsAI/medical-triples](https://huggingface.co/datasets/WhereIsAI/medical-triples), processed from [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA).
21
+ - Test Data: [WhereIsAI/pubmedqa-test-angle-format-a](https://huggingface.co/datasets/WhereIsAI/pubmedqa-test-angle-format-a)
22
 
23
+ **Performance:**
24
 
25
+ | Model | Pooling Strategy | Spearman's Correlation |
26
+ |----------------------------------------|------------------|:----------------------:|
27
+ | tavakolih/all-MiniLM-L6-v2-pubmed-full | avg | 84.56 |
28
+ | NeuML/pubmedbert-base-embeddings | avg | 84.88 |
29
+ | **WhereIsAI/pubmed-angle-base-en** | cls | 86.01 |
30
+ | WhereIsAI/pubmed-angle-large-en | cls | 86.21 |
31
 
 
32
 
33
+ ## Usage
34
 
35
+ ### via angle-emb
36
 
37
+ ```bash
38
+ python -m pip install -U angle-emb
39
+ ```
40
 
41
+ Example:
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ ```python
44
+ from angle_emb import AnglE
45
+ from angle_emb.utils import cosine_similarity
46
 
47
+ angle = AnglE.from_pretrained('WhereIsAI/pubmed-angle-base-en', pooling_strategy='cls').cuda()
48
 
49
+ query = 'How to treat childhood obesity and overweight?'
50
+ docs = [
51
+ query,
52
+ 'The child is overweight. Parents should relieve their children\'s symptoms through physical activity and healthy eating. First, they can let them do some aerobic exercise, such as jogging, climbing, swimming, etc. In terms of diet, children should eat more cucumbers, carrots, spinach, etc. Parents should also discourage their children from eating fried foods and dried fruits, which are high in calories and fat. Parents should not let their children lie in bed without moving after eating. If their children\'s condition is serious during the treatment of childhood obesity, parents should go to the hospital for treatment under the guidance of a doctor in a timely manner.',
53
+ 'If you want to treat tonsillitis better, you can choose some anti-inflammatory drugs under the guidance of a doctor, or use local drugs, such as washing the tonsil crypts, injecting drugs into the tonsils, etc. If your child has a sore throat, you can also give him or her some pain relievers. If your child has a fever, you can give him or her antipyretics. If the condition is serious, seek medical attention as soon as possible. If the medication does not have a good effect and the symptoms recur, the author suggests surgical treatment. Parents should also make sure to keep their children warm to prevent them from catching a cold and getting tonsillitis again.',
54
+ ]
55
 
56
+ embeddings = angle.encode(docs)
57
+ query_emb = embeddings[0]
58
 
59
+ for doc, emb in zip(docs[1:], embeddings[1:]):
60
+ print(cosine_similarity(query_emb, emb))
61
+
62
+ # 0.8029839020052982
63
+ # 0.4260630076818197
64
+ ```
65
+
66
+
67
+ ### via sentence-transformers
68
+
69
+ Install sentence-transformers
70
+
71
+ ```bash
72
+ python -m pip install -U sentence-transformers
73
+ ```
74
+
75
+
76
+ ## Citation
77
+
78
+ If you use this model for academic papers, please cite angle's paper, as follows:
79
+
80
+ ```bibtext
81
+ @article{li2023angle,
82
+ title={AnglE-optimized Text Embeddings},
83
+ author={Li, Xianming and Li, Jing},
84
+ journal={arXiv preprint arXiv:2309.12871},
85
+ year={2023}
86
+ }
87
+ ```