mmnga commited on
Commit
6285e4f
·
1 Parent(s): 82ed857

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ inference: false
6
+ ---
7
+ # Model Card for TinyMixtral-x8-Clonebase-7b
8
+ This model is based on [TinyLlama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), converted to a mistral model, and then placed the clone in mixtral.
9
+ **This model was created experimentally for training a small mixtral.**
10
+
11
+ # How it was made
12
+ First, since tinyllama is an llama model, I converted it to a mistral model.
13
+
14
+ After that, I cloned the FFN part and made it experts.
15
+ Since they are all the same tensor, the performance does not change.
16
+ All gates have the same value.
17
+
18
+ # How To Convert
19
+ use colab cpu-high-memory.
20
+ This model was created with experts=8, but since it is a clone, you can create as many experts as you like.
21
+
22
+ [tinyllama_to_mixtral_clonebase.ipynb](https://huggingface.co/mmnga/TinyMixtral-x8-Clonebase-7b)
23
+
24
+ # Usage
25
+ ~~~python
26
+ pip install transformers --upgrade
27
+ pip install flash_attn
28
+ ~~~
29
+
30
+ ~~~python
31
+ from transformers import AutoTokenizer, AutoModelForCausalLM, MixtralForCausalLM
32
+ import torch
33
+
34
+ model_name_or_path = "mmnga/TinyMixtral-x8-Clonebase-7b"
35
+
36
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
37
+ model = MixtralForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
38
+
39
+ # set num_experts_per_tok 1 or 2 ?
40
+ model.config.num_experts_per_tok = 2
41
+
42
+ # message
43
+ messages = [
44
+ {"role": "user", "content": "Tell me what's for dinner tonight."},
45
+ ]
46
+
47
+ with torch.no_grad():
48
+ token_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")
49
+ output_ids = model.generate(
50
+ token_ids.to(model.device),
51
+ temperature=0.5,
52
+ do_sample=True,
53
+ top_p=0.95,
54
+ top_k=40,
55
+ max_new_tokens=128,
56
+ repetition_penalty=1.5
57
+ )
58
+ output = tokenizer.decode(output_ids[0][token_ids.size(1) :])
59
+ print(output)
60
+
61
+ ~~~