File size: 6,629 Bytes
7bfd5e4 dfbdc5d 65eb432 6150b46 7bfd5e4 07fef10 7418b7c 7bfd5e4 55744a7 e7fa9b8 7bfd5e4 1f925b1 65eb432 1f925b1 65eb432 1f925b1 65eb432 1f925b1 65eb432 55744a7 414406b 55744a7 414406b 55744a7 414406b 55744a7 4638608 55744a7 a97ffa3 65eb432 a97ffa3 1f925b1 dfbdc5d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
---
library_name: transformers
datasets:
- Intel/orca_dpo_pairs
language:
- en
tags:
- mistral-7b
- mistral
- dpo
- neuralhermes
- instruct
- rlhf
- notebook
- endtoend
license: apache-2.0
---
- Based model `teknium/OpenHermes-2.5-Mistral-7B`
- Refined using Direct Preference Optimization (DPO) with the `Intel/orca_dpo_pairs`.
## Uses
### Direct Use
Way 1 (see the next one for faster inference `Way 2`)
```python
import transformers
from transformers import AutoTokenizer
new_model="abdullahalzubaer/NeuralHermes-2.5-Mistral-7B"
# Format prompt
message = [
{"role": "system", "content": "You are a helpful assistant chatbot."},
{"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
# Create pipeline
pipeline = transformers.pipeline(
"text-generation",
model=new_model,
tokenizer=tokenizer
)
# Generate text
sequences = pipeline(
prompt,
do_sample=True,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
max_length=200,
)
print(sequences[0]['generated_text'])
```
Sample Output from `abdullahalzubaer/NeuralHermes-2.5-Mistral-7B`
```
<|im_start|>system
You are a helpful assistant chatbot.<|im_end|>
<|im_start|>user
What is a Large Language Model?<|im_end|>
<|im_start|>assistant
A large language model is an artificial intelligence system designed to process and understand large amounts of natural language data.
It's a type of machine learning model, typically built using neural networks,
that is trained on vast datasets of text to learn patterns and relationships within the language.
These models can then generate human-like text, predict the next word in a sequence, perform language translation,
and answer questions, among other tasks. The "large" in the term refers to the size of the model, which includes
the number of parameters, the complexity of the architecture, and the amount of training data it processes.
As a result, large language models are capable of generating more complex and coherent responses compared to smaller models.
```
Sample Output from `mlabonne/NeuralHermes-2.5-Mistral-7B` (provided as in the [tutorial](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html))
```
<|im_start|>system
You are a helpful assistant chatbot.<|im_end|>
<|im_start|>user
What is a Large Language Model?<|im_end|>
<|im_start|>assistant
A large language model is a type of artificial intelligence (AI) system that has been trained on vast amounts of text data.
These models are designed to understand and generate human language, allowing them to perform various natural
language processing tasks, such as text generation, language translation, and question answering. Large language models
typically use deep learning techniques, like recurrent neural networks (RNNs) or transformers, to learn patterns and
relationships in the data, enabling them to generate coherent and contextually relevant responses.
The size of these models, in terms of the number of parameters and the volume of data they are trained on,
plays a significant role in their ability to comprehend and produce complex language structures.
```
Therefore it worked maybe not as good as the original model but still close to it (due to max lenght in DPOTrainer?)
Way 2 (not sure but it is significantly faster than Way 1 above - therefore I recommend this. Taken directly from
[mistral model card](https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.2) and just replaced with my model)
```python
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import trl
from trl import AutoModelForCausalLMWithValueHead, PPOConfig, PPOTrainer
print(torch.__version__)
print(transformers.__version__)
print(trl.__version__)
'''
1.13.0+cu117
4.38.2
0.7.11
'''
model_tokenizer = "abdullahalzubaer/NeuralHermes-2.5-Mistral-7B" #lets try my model
# model_tokenizer = "mistralai/Mistral-7B-Instruct-v0.2"
# model_tokenizer = "mistralai/Mixtral-8x7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_tokenizer)
tokenizer = AutoTokenizer.from_pretrained(model_tokenizer)
print(f"Loaded Model = {model.config._name_or_path}")
print(f"Loaded Tokenizer = {tokenizer.name_or_path}")
# Check available GPUs and print their names
gpu_count = torch.cuda.device_count()
print("Available GPUs:", gpu_count)
for i in range(gpu_count):
print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
# Choose a specific GPU (e.g., GPU 0)
device_id = 3 # Change this to select a different GPU
device = f"cuda:{device_id}" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
your_prompt="""What is a Large Language Model?"""
messages = [
{"role": "user", "content": your_prompt},
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(f"\nComplete I/O:\n{decoded[0]}")
# print(f"Using device: {device}")
# print(f"\nModel Reply:\n{decoded[0].split('[/INST]')[1]}")
'''
Complete I/O:
<|im_start|> user
What is a Large Language Model? Elaborate.
<|im_end|>
A Large Language Model is a type of artificial intelligence algorithm
designed to generate human-like text or respond to natural language input.
It is typically trained on vast amounts of text data, enabling it to
understand and generate language with a high level of complexity.<|im_end|>
'''
```
# Loss
| Step | Training Loss |
|-----|---------|
| 1 | 0.693300|
| 2 | 0.693200|
| 3 | 0.692500|
| 4 | 0.691300|
| 5 | 0.68940 |
| ... | ... |
| 45 | 0.633700|
| 46 | 0.629000|
| 47 | 0.591300|
| 48 | 0.558100|
| 49 | 0.585800|
| 50 | 0.558900|
# Hyperparameters:
All hyperparameters are as [here](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html) except the following
```python
# for TrainingArguments()
dataloader_num_workers=1, # had to add this #CHANGED_HERE#
dataloader_prefetch_factor=1
# for DPOTrainer()
# ref_model (it is not required as prompted by error when I included a reference model: not sure why tho, needs further investigation)
max_prompt_length=256, # had to lower this to 256 #CHANGED_HERE# or else cuda out of memory
max_length=256, # had to lower this to 256 #CHANGED_HERE# cuda out of memory
```
# Reference
Thanks! https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html |