Llama-3.3-FakeSwallow-70B-Instruct-v0.1

🚨 Only for research purpose. This model may have repetition issues.

This is a merge of pre-trained language models created using mergekit.

2024.12.11 : The model weight updated.

Test environment

🔧 HACK: Try oobabooga/text-generation-webui#5885 if multiple EOS tokens doesn't work.

This model was tested using text-generation-webui. I use preset min_p with temperature=1 for Generation.

Usage

This format must be adhered to strictly, as deviations may result in less optimal outputs from the model.

The template used to construct a prompt for the instruct model is specified as follows:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>

{USER_MESSAGE}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

For the "{SYSTEM_PROMPT}" part, We recommend using "あなたは誠実で優秀な日本人のアシスタントです。" or "You are a helpful assistant."

For the "{USER_MESSAGE}" part, We recommend using {instruction}\n{input}

In other words, We recommend the following:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

あなたは誠実で優秀な日本人のアシスタントです。<|eot_id|><|start_header_id|>user<|end_header_id|>

{instruction}
{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Use the instruct model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nitky/Llama-3.3-FakeSwallow-70B-Instruct-v0.1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Merge Details

Merge Method

This model was merged using the task arithmetic merge method using meta-llama/Llama-3.1-70B as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

merge_method: task_arithmetic
base_model: meta-llama/Llama-3.1-70B
models:
  - model: tokyotech-llm/Llama-3.1-Swallow-70B-v0.1
    parameters:
      weight: 1.0
  - model: meta-llama/Llama-3.3-70B-Instruct
    parameters:
      weight: 0.998
dtype: bfloat16
name: Llama-3.3-FakeSwallow-70B-Instruct-v0.1

nitky
/

Llama-3.3-FakeSwallow-70B-Instruct-v0.1