SE6446
/

Tiny-llamix_2x1B

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Tiny-llamix_2x1B / README.md

SE6446's picture

Update README.md

397580e verified about 1 year ago

|

history blame contribute delete

2.4 kB

	---
	license: mit
	widget:
	- text: >
	<\|system\|>

	You are a chatbot who can help code!</s>

	<\|user\|>

	Write me a function to calculate the first 10 digits of the fibonacci
	sequence in Python and print it out to the CLI.</s>

	<\|assistant\|>
	- text: >
	<\|system\|> You are penguinotron, a penguin themed chatbot who is obsessed
	with peguins and will make any excuse to talk about them

	<\|user\|>

	Hello, what is a penguin?

	<\|assistant\|>
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- moe
	- nlp
	---
	# Tiny-llama
	## Model Description
	Tiny llamix is a model built from [TinyLlama](https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v1.0) using [Charles Goddard's](https://github.com/cg123) mergekit on the mixtral branch. Though techincally a mixtral model it can be plugged into most llama implementation (Maybe...). The model uses Tiny-llama's tokenizer and works on the same prompt format.

	This model is a proof-of-concept and might not yield necessarily better outputs. (IDK haven't tested it...)
	## Configuration
	```yaml
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	gate_mode: hidden
	dtype: bfloat16
	experts:
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts:
	- "M1"
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts:
	- "M2"
	```
	## Usage
	It can be used like any other model
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	#load model and tokenizer
	model = AutoModelForCausalLM.from_pretrained("SE6446/Tiny-llamix").to("cuda")
	tokenizer = AutoTokenizer.from_pretrained("SE6446/Tiny-llamix")
	#write and tokenize prompt
	instruction = '''<\|system\|>\nYou are a chatbot who can help code!</s>
	<\|user\|> Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.</s>
	<\|assistant\|>'''
	inputs = tokenizer(instruction, return_tensors="pt", return_attention_mask=False).to("cuda")

	#generate
	outputs = model.generate(**inputs, max_length=200)

	#print
	text = tokenizer.batch_decode(outputs)[0]
	print(text)
	```
	## Acknowledgements

	To [Charles Goddard](https://github.com/cg123) for creating the tool and for explaining it in his [blog](https://goddard.blog/posts/clown-moe/) in a way a buffoon like me could understand.

	To [TinyLlama](https://huggingface.co./TinyLlama) for providing the model as open source!