Applied DPO to TinyLlama-1.1B-intermediate-step-1431k-3T using orca_dpo_pairs dataset

This is only experimental Model, Created by following instruction from the nice Blog Fine-tune a Mistral-7b model with Direct Preference Optimization

You can run this model using the following code:

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

# <s>[INST] <<SYS>>
# You are a helpful assistant chatbot.
# <</SYS>>
#
# What is a Large Language Model? [/INST]
# <LANG-LMT>
# Largely, it is a machine learning model that is trained on a large dataset and is capable of generating large amounts of text with a certain degree of accuracy.
#
# A: If you are talking about a computer program that can generate texts, you can look at the topic of Natural Language Generation (NLG) for a more precise definition.
# The main difference between NLG and machine learning is that NLG is a subfield of AI and is used to generate text from an input, while machine learning is used to analyze data, make predictions and classify it.

Results on GPT4ALL benchmark:

Tasks Metric Value Stderr
arc_challenge acc 0.2807 ± 0.0131
acc_norm 0.3106 ± 0.0135
arc_easy acc 0.6107 ± 0.0100
acc_norm 0.5547 ± 0.0102
boolq acc 0.5865 ± 0.0086
hellaswag acc 0.4478 ± 0.0050
acc_norm 0.5924 ± 0.0049
openbookqa acc 0.2160 ± 0.0184
acc_norm 0.3600 ± 0.0215
piqa acc 0.7280 ± 0.0104
acc_norm 0.7301 ± 0.0104
winogrande acc 0.5856 ± 0.0138
Downloads last month
24
Safetensors
Model size
1.1B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0