|
--- |
|
license: llama3 |
|
datasets: |
|
- mzbac/glaive-function-calling-v2-llama-3-format |
|
language: |
|
- en |
|
--- |
|
|
|
# Model |
|
|
|
This model is fine-tuned based on Meta-Llama/Meta-Llama-3-8B instructions via mlx-lm. |
|
|
|
**Note:** The glaive-function-calling-v2 dataset contains some invalid JSON and single quotes for the arguments' values. I have re-trained the model based on cleaned-up data. If you encounter issues with the function calling JSON format, you may try this new version here: https://huggingface.co./mzbac/llama-3-8B-Instruct-function-calling-v0.2 |
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
model_id = "mzbac/llama-3-8B-Instruct-function-calling" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.bfloat16, |
|
device_map="auto", |
|
) |
|
|
|
tool = { |
|
"name": "search_web", |
|
"description": "Perform a web search for a given search terms.", |
|
"parameter": { |
|
"type": "object", |
|
"properties": { |
|
"search_terms": { |
|
"type": "array", |
|
"items": {"type": "string"}, |
|
"description": "The search queries for which the search is performed.", |
|
"required": True, |
|
} |
|
} |
|
}, |
|
} |
|
|
|
messages = [ |
|
{ |
|
"role": "system", |
|
"content": f"You are a helpful assistant with access to the following functions. Use them if required - {str(tool)}", |
|
}, |
|
{"role": "user", "content": "Today's news in Melbourne, just for your information, today is April 27, 2014."}, |
|
] |
|
|
|
input_ids = tokenizer.apply_chat_template( |
|
messages, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
).to(model.device) |
|
|
|
terminators = [ |
|
tokenizer.eos_token_id, |
|
tokenizer.convert_tokens_to_ids("<|eot_id|>") |
|
] |
|
|
|
outputs = model.generate( |
|
input_ids, |
|
max_new_tokens=256, |
|
eos_token_id=terminators, |
|
do_sample=True, |
|
temperature=0.1, |
|
) |
|
response = outputs[0] |
|
print(tokenizer.decode(response)) |
|
|
|
# <|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
|
# You are a helpful assistant with access to the following functions. Use them if required - {'name':'search_web', 'description': 'Perform a web search for a given search terms.', 'parameter': {'type': 'object', 'properties': {'search_terms': {'type': 'array', 'items': {'type':'string'}, 'description': 'The search queries for which the search is performed.','required': True}}}}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
# Today's news in Melbourne, just for your information, today is April 27, 2014.<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
# <functioncall> {"name": "search_web", "arguments": '{"search_terms": ["Melbourne news", "April 27, 2014"]}'}<|eot_id|> |
|
``` |
|
## Training hyperparameters |
|
lora_config.yaml |
|
```yaml |
|
# The path to the local model directory or Hugging Face repo. |
|
model: "meta-llama/Meta-Llama-3-8B-Instruct" |
|
# Whether or not to train (boolean) |
|
train: true |
|
|
|
# Directory with {train, valid, test}.jsonl files |
|
data: "data" |
|
|
|
# The PRNG seed |
|
seed: 0 |
|
|
|
# Number of layers to fine-tune |
|
lora_layers: 32 |
|
|
|
# Minibatch size. |
|
batch_size: 1 |
|
|
|
# Iterations to train for. |
|
iters: 6000 |
|
|
|
# Number of validation batches, -1 uses the entire validation set. |
|
val_batches: 25 |
|
|
|
# Adam learning rate. |
|
learning_rate: 1e-6 |
|
|
|
# Number of training steps between loss reporting. |
|
steps_per_report: 10 |
|
|
|
# Number of training steps between validations. |
|
steps_per_eval: 200 |
|
|
|
# Load path to resume training with the given adapter weights. |
|
resume_adapter_file: null |
|
|
|
# Save/load path for the trained adapter weights. |
|
adapter_path: "adapters" |
|
|
|
# Save the model every N iterations. |
|
save_every: 1000 |
|
|
|
# Evaluate on the test set after training |
|
test: false |
|
|
|
# Number of test set batches, -1 uses the entire test set. |
|
test_batches: 100 |
|
|
|
# Maximum sequence length. |
|
max_seq_length: 8192 |
|
|
|
# Use gradient checkpointing to reduce memory use. |
|
grad_checkpoint: false |
|
|
|
# LoRA parameters can only be specified in a config file |
|
lora_parameters: |
|
# The layer keys to apply LoRA to. |
|
# These will be applied for the last lora_layers |
|
keys: ['mlp.gate_proj', 'mlp.down_proj', 'self_attn.q_proj', 'mlp.up_proj', 'self_attn.o_proj','self_attn.v_proj', 'self_attn.k_proj'] |
|
rank: 128 |
|
alpha: 256 |
|
scale: 10.0 |
|
dropout: 0.05 |
|
|
|
# Schedule can only be specified in a config file, uncomment to use. |
|
#lr_schedule: |
|
# name: cosine_decay |
|
# warmup: 100 # 0 for no warmup |
|
# warmup_init: 1e-7 # 0 if not specified |
|
# arguments: [1e-6, 1000, 1e-7] # passed to scheduler |
|
``` |