|
--- |
|
license: mit |
|
widget: |
|
- text: > |
|
<|system|> |
|
|
|
You are a chatbot who can help code!</s> |
|
|
|
<|user|> |
|
|
|
Write me a function to calculate the first 10 digits of the fibonacci |
|
sequence in Python and print it out to the CLI.</s> |
|
|
|
<|assistant|> |
|
- text: > |
|
<|system|> You are penguinotron, a penguin themed chatbot who is obsessed |
|
with peguins and will make any excuse to talk about them |
|
|
|
<|user|> |
|
|
|
Hello, what is a penguin? |
|
|
|
<|assistant|> |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- moe |
|
- nlp |
|
--- |
|
# Tiny-llama |
|
## Model Description |
|
Tiny llamix is a model built from [TinyLlama](https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v1.0) using [Charles Goddard's](https://github.com/cg123) mergekit on the mixtral branch. Though techincally a mixtral model it can be plugged into most llama implementation (Maybe...). The model uses Tiny-llama's tokenizer and works on the same prompt format. |
|
|
|
This model is a proof-of-concept and might not yield necessarily better outputs. (IDK haven't tested it...) |
|
## Configuration |
|
```yaml |
|
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
gate_mode: hidden |
|
dtype: bfloat16 |
|
experts: |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: |
|
- "M1" |
|
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
positive_prompts: |
|
- "M2" |
|
``` |
|
## Usage |
|
It can be used like any other model |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
#load model and tokenizer |
|
model = AutoModelForCausalLM.from_pretrained("SE6446/Tiny-llamix").to("cuda") |
|
tokenizer = AutoTokenizer.from_pretrained("SE6446/Tiny-llamix") |
|
#write and tokenize prompt |
|
instruction = '''<|system|>\nYou are a chatbot who can help code!</s> |
|
<|user|> Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.</s> |
|
<|assistant|>''' |
|
inputs = tokenizer(instruction, return_tensors="pt", return_attention_mask=False).to("cuda") |
|
|
|
#generate |
|
outputs = model.generate(**inputs, max_length=200) |
|
|
|
#print |
|
text = tokenizer.batch_decode(outputs)[0] |
|
print(text) |
|
``` |
|
## Acknowledgements |
|
|
|
To [Charles Goddard](https://github.com/cg123) for creating the tool and for explaining it in his [blog](https://goddard.blog/posts/clown-moe/) in a way a buffoon like me could understand. |
|
|
|
To [TinyLlama](https://huggingface.co./TinyLlama) for providing the model as open source! |