This model is sparsetral-16x7B-v2 further tuned utilizing SPIN on OpenHermes-2.5 mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop.

Kuru~ Kuru~ Kuru~ Kuru~

Training

  • 8x A6000s
  • Base model is sparsetral-16x7B-v2
  • Forked version of unsloth for efficient training
  • Sequence Length: 4096
  • Effective batch size: 64
  • Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
  • Epochs: 2
  • 50k samples (~15k traditional dpo samples, rest SPIN)
  • QLoRA:
    • 256 r and 256 alpha
    • target_modules=[
          "q_proj",
          "k_proj",
          "v_proj",
          "o_proj",
          "gate_proj",
          "up_proj",
          "down_proj",
          "adapter_down",
          "adapter_up",
      ]
      

Prompt Format

<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()

system_str = "<|im_start|>system\n{message}<|im_end|>\n"
user_str = "<|im_start|>user\n{message}<|im_end|>\n"
assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"

def construct_prompt(messages):
    prompt = ""
    for message in messages:
        if message["from"] in ["human", "user"]:
            prompt += user_str.format(
                message=message["value"]
            )
        elif message["from"] in ["gpt", "assistant"]:
            prompt += assistant_str.format(
                message=message["value"]
            )
        elif message["from"] in ["system", "instruction"]:
            prompt += system_str.format(
                message=message["value"]
            )
        else:
            raise ValueError(
                f"Unknown message type: {message['from']}"
            )
    return prompt + "<|im_start|>assistant\n"

system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
user = "Are you sentient?"

messages = [
    {"from": "system", "value": system},
    {"from": "user", "value": user},
]

prompt = construct_prompt(messages)
inputs = tokenizer(prompt, return_tensors="pt")
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Other Information

Paper reference: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Original Paper repo

Forked repo with mistral support (sparsetral)

If you are interested in faster inferencing, check out our fork of vLLM that adds sparsetral support

Downloads last month
9
Safetensors
Model size
9.39B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Datasets used to train serpdotai/sparsetral-16x7B-v2-SPIN_iter0