SE6446
/

Tiny-llamix_2x1B

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Tiny-llamix_2x1B / README.md

SE6446's picture

Create README.md

8ca2f4a verified about 1 year ago

|

1.62 kB

	---
	license: mit
	widget:
	- text: >
	<\|system\|>

	You are a chatbot who can help code!</s>

	<\|user\|>

	Write me a function to calculate the first 10 digits of the fibonacci
	sequence in Python and print it out to the CLI.</s>

	<\|assistant\|>
	library_name: transformers
	pipeline_tag: text-generation
	---
	# Tiny-llama
	## Model Description
	Tiny llamix is a model built from [TinyLlama](https://huggingface.co./TinyLlama/TinyLlama-1.1B-Chat-v1.0) using [Charles Goddard's](https://github.com/cg123) mergekit on the mixtral branch.

	## Configuration
	```yaml
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	gate_mode: hidden
	dtype: bfloat16
	experts:
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts:
	- "M1"
	- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	positive_prompts:
	- "M2"
	```
	## Usage
	It can be used like any other model
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	#load model and tokenizer
	model = AutoModelForCausalLM.from_pretrained("SE6446/Tiny-llamix").to("cuda")
	tokenizer = AutoTokenizer.from_pretrained("SE6446/Tiny-llamix")
	#write and tokenize prompt
	instruction = '''<\|system\|>\nYou are a chatbot who can help code!</s>
	<\|user\|> Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.</s>
	<\|assistant\|>'''
	inputs = tokenizer(instruction, return_tensors="pt", return_attention_mask=False).to("cuda")

	#generate
	outputs = model.generate(**inputs, max_length=200)

	#print
	text = tokenizer.batch_decode(outputs)[0]
	print(text)
	```
	## Performance (coming soon!)