phixtral-4x2_8-gates-poc
phixtral-4x2_8-gates-poc is phixtral-4x2_8 with finetuned gates for better selection of Expert and to break the symmetry. As a POC we only used 400 shorter samples from openhermes.
phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture. It performs better than each individual expert.
π Evaluation
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
phixtral-4x2_8 | 33.91 | 70.44 | 48.78 | 37.68 | 47.7 |
dolphin-2_6-phi-2 | 33.12 | 69.85 | 47.39 | 37.2 | 46.89 |
phi-2-dpo | 30.39 | 71.68 | 50.75 | 34.9 | 46.93 |
phi-2-sft-dpo-gpt4_en-ep1 | 30.61 | 71.13 | 48.74 | 35.23 | 46.43 |
phi-2-coder | TBD | TBD | TBD | TBD | TBD |
phi-2 | 27.98 | 70.8 | 44.43 | 35.21 | 44.61 |
Check YALL - Yet Another LLM Leaderboard to compare it with other models.
𧩠Configuration
The model has been made with a custom version of the mergekit library (mixtral branch) and the following configuration:
base_model: cognitivecomputations/dolphin-2_6-phi-2
gate_mode: cheap_embed
experts:
- source_model: cognitivecomputations/dolphin-2_6-phi-2
positive_prompts: [""]
- source_model: lxuechen/phi-2-dpo
positive_prompts: [""]
- source_model: Yhyu13/phi-2-sft-dpo-gpt4_en-ep1
positive_prompts: [""]
- source_model: mrm8488/phi-2-coder
positive_prompts: [""]
π» Usage
Here's a Colab notebook to run Phixtral in 4-bit precision on a free T4 GPU.
!pip install -q --upgrade transformers einops accelerate bitsandbytes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "phixtral-4x2_8"
instruction = '''
def print_prime(n):
"""
Print all primes between 1 and n
"""
'''
torch.set_default_device("cuda")
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
f"mlabonne/{model_name}",
torch_dtype="auto",
load_in_4bit=True,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
f"mlabonne/{model_name}",
trust_remote_code=True
)
# Tokenize the input string
inputs = tokenizer(
instruction,
return_tensors="pt",
return_attention_mask=False
)
# Generate text using the model
outputs = model.generate(**inputs, max_length=200)
# Decode and print the output
text = tokenizer.batch_decode(outputs)[0]
print(text)
Inspired by mistralai/Mixtral-8x7B-v0.1, you can specify the num_experts_per_tok
and num_local_experts
in the config.json
file (2 and 4 by default). This configuration is automatically loaded in configuration.py
.
vince62s implemented the MoE inference code in the modeling_phi.py
file. In particular, see the MoE class.
π€ Acknowledgments
A special thanks to vince62s for the inference code and the dynamic configuration of the number of experts. He was very patient and helped me to debug everything.
Thanks to Charles Goddard for the mergekit library and the implementation of the MoE for clowns.
Thanks to ehartford, lxuechen, Yhyu13, and mrm8488 for their fine-tuned phi-2 models.
- Downloads last month
- 23