cxllin commited on
Commit
46cfc8b
1 Parent(s): 002e591

Create README.md

Browse files

updated model card

Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - cxllin/medinstructv2
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: question-answering
9
+ tags:
10
+ - medical
11
+ ---
12
+
13
+
14
+ `StableMed` is a 3 billion parameter decoder-only language model fine tuned on 18k rows of medical questions over 1 epoch.
15
+ ## Usage
16
+
17
+ Get started generating text with `StableMed` by using the following code snippet:
18
+
19
+ ```python
20
+ from transformers import AutoModelForCausalLM, AutoTokenizer
21
+ tokenizer = AutoTokenizer.from_pretrained("cxllin/StableMed-3b")
22
+ model = AutoModelForCausalLM.from_pretrained(
23
+ "stabilityai/stablelm-3b-4e1t",
24
+ trust_remote_code=True,
25
+ torch_dtype="auto",
26
+ )
27
+ model.cuda()
28
+ inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to("cuda")
29
+ tokens = model.generate(
30
+ **inputs,
31
+ max_new_tokens=64,
32
+ temperature=0.75,
33
+ top_p=0.95,
34
+ do_sample=True,
35
+ )
36
+ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
37
+ ```
38
+
39
+ ### Model Architecture
40
+
41
+ The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications:
42
+
43
+ | Parameters | Hidden Size | Layers | Heads | Sequence Length |
44
+ |----------------|-------------|--------|-------|-----------------|
45
+ | 2,795,443,200 | 2560 | 32 | 32 | 4096 |
46
+
47
+ * **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf).
48
+ * **Normalization**: LayerNorm ([Ba et al., 2016](https://arxiv.org/abs/1607.06450)) with learned bias terms as opposed to RMSNorm ([Zhang & Sennrich, 2019](https://arxiv.org/abs/1910.07467)).
49
+ * **Tokenizer**: GPT-NeoX ([Black et al., 2022](https://arxiv.org/abs/2204.06745)).