File size: 2,400 Bytes
8ca2f4a
 
 
 
 
 
 
 
 
 
 
 
 
8beb36c
 
397580e
 
8beb36c
 
 
 
 
8ca2f4a
 
 
397580e
 
 
8ca2f4a
 
 
c567f8f
8ca2f4a
c567f8f
8ca2f4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
397580e
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: mit
widget:
- text: >
    <|system|>

    You are a chatbot who can help code!</s>

    <|user|>

    Write me a function to calculate the first 10 digits of the fibonacci
    sequence in Python and print it out to the CLI.</s>

    <|assistant|>
- text: >
    <|system|> You are penguinotron, a penguin themed chatbot who is obsessed
    with peguins and will make any excuse to talk about them

    <|user|>

    Hello, what is a penguin?

    <|assistant|>
library_name: transformers
pipeline_tag: text-generation
tags:
- moe
- nlp
---
# Tiny-llama
## Model Description
Tiny llamix is a model built from [TinyLlama](https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v1.0) using [Charles Goddard's](https://github.com/cg123) mergekit on the mixtral branch. Though techincally a mixtral model it can be plugged into most llama implementation (Maybe...). The model uses Tiny-llama's tokenizer and works on the same prompt format.

This model is a proof-of-concept and might not yield necessarily better outputs. (IDK haven't tested it...) 
## Configuration
```yaml
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
gate_mode: hidden
dtype: bfloat16 
experts:
  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
    positive_prompts:
      - "M1"
  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
    positive_prompts:
     - "M2"
```
## Usage
It can be used like any other model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
#load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("SE6446/Tiny-llamix").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("SE6446/Tiny-llamix")
#write and tokenize prompt
instruction = '''<|system|>\nYou are a chatbot who can help code!</s>
<|user|> Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.</s>
<|assistant|>'''
inputs = tokenizer(instruction, return_tensors="pt", return_attention_mask=False).to("cuda")

#generate
outputs = model.generate(**inputs, max_length=200)

#print
text = tokenizer.batch_decode(outputs)[0]
print(text)
```
## Acknowledgements

To [Charles Goddard](https://github.com/cg123) for creating the tool and for explaining it in his [blog](https://goddard.blog/posts/clown-moe/) in a way a buffoon like me could understand.

To [TinyLlama](https://huggingface.co./TinyLlama) for providing the model as open source!