File size: 6,629 Bytes
7bfd5e4
 
dfbdc5d
 
65eb432
 
 
 
 
 
 
 
 
 
 
6150b46
7bfd5e4
 
07fef10
7418b7c
7bfd5e4
 
55744a7
 
e7fa9b8
7bfd5e4
1f925b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65eb432
 
1f925b1
 
 
 
 
 
 
65eb432
 
 
 
 
 
 
 
 
 
1f925b1
65eb432
 
 
 
 
 
 
 
 
 
 
 
1f925b1
65eb432
 
 
55744a7
 
 
 
 
 
 
 
 
414406b
55744a7
 
 
 
 
414406b
55744a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
414406b
 
55744a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4638608
 
 
 
55744a7
 
 
 
a97ffa3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65eb432
 
 
 
 
 
 
 
 
 
 
 
 
 
a97ffa3
1f925b1
 
dfbdc5d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
library_name: transformers
datasets:
- Intel/orca_dpo_pairs
language:
- en
tags:
- mistral-7b
- mistral
- dpo
- neuralhermes
- instruct
- rlhf
- notebook
- endtoend
license: apache-2.0
---

- Based model `teknium/OpenHermes-2.5-Mistral-7B`
- Refined using Direct Preference Optimization (DPO) with the `Intel/orca_dpo_pairs`.
## Uses

### Direct Use 

Way 1 (see the next one for faster inference `Way 2`)

```python

import transformers
from transformers import AutoTokenizer

new_model="abdullahalzubaer/NeuralHermes-2.5-Mistral-7B"

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

```

Sample Output from `abdullahalzubaer/NeuralHermes-2.5-Mistral-7B` 


```
<|im_start|>system
You are a helpful assistant chatbot.<|im_end|>
<|im_start|>user
What is a Large Language Model?<|im_end|>
<|im_start|>assistant
A large language model is an artificial intelligence system designed to process and understand large amounts of natural language data.
It's a type of machine learning model, typically built using neural networks,
that is trained on vast datasets of text to learn patterns and relationships within the language.
These models can then generate human-like text, predict the next word in a sequence, perform language translation,
and answer questions, among other tasks. The "large" in the term refers to the size of the model, which includes
the number of parameters, the complexity of the architecture, and the amount of training data it processes.
As a result, large language models are capable of generating more complex and coherent responses compared to smaller models.
```

Sample Output from `mlabonne/NeuralHermes-2.5-Mistral-7B` (provided as in the [tutorial](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html))
```
<|im_start|>system
You are a helpful assistant chatbot.<|im_end|>
<|im_start|>user
What is a Large Language Model?<|im_end|>
<|im_start|>assistant
A large language model is a type of artificial intelligence (AI) system that has been trained on vast amounts of text data.
These models are designed to understand and generate human language, allowing them to perform various natural
language processing tasks, such as text generation, language translation, and question answering. Large language models
typically use deep learning techniques, like recurrent neural networks (RNNs) or transformers, to learn patterns and
relationships in the data, enabling them to generate coherent and contextually relevant responses.
The size of these models, in terms of the number of parameters and the volume of data they are trained on,
plays a significant role in their ability to comprehend and produce complex language structures.

```

Therefore it worked maybe not as good as the original model but still close to it (due to max lenght in DPOTrainer?)


Way 2 (not sure but it is significantly faster than Way 1 above - therefore I recommend this. Taken directly from 
[mistral model card](https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.2) and just replaced with my model)

```python

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import trl
from trl import AutoModelForCausalLMWithValueHead, PPOConfig, PPOTrainer
print(torch.__version__)
print(transformers.__version__)
print(trl.__version__)


'''
1.13.0+cu117
4.38.2
0.7.11
'''


model_tokenizer = "abdullahalzubaer/NeuralHermes-2.5-Mistral-7B" #lets try my model
# model_tokenizer = "mistralai/Mistral-7B-Instruct-v0.2"
# model_tokenizer = "mistralai/Mixtral-8x7B-Instruct-v0.1"

model = AutoModelForCausalLM.from_pretrained(model_tokenizer)
tokenizer = AutoTokenizer.from_pretrained(model_tokenizer)

print(f"Loaded Model = {model.config._name_or_path}")
print(f"Loaded Tokenizer = {tokenizer.name_or_path}")

# Check available GPUs and print their names
gpu_count = torch.cuda.device_count()
print("Available GPUs:", gpu_count)
for i in range(gpu_count):
    print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
    
# Choose a specific GPU (e.g., GPU 0)
device_id = 3  # Change this to select a different GPU
device = f"cuda:{device_id}" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")


your_prompt="""What is a Large Language Model?"""

messages = [
    {"role": "user", "content": your_prompt},
]


encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(f"\nComplete I/O:\n{decoded[0]}")
# print(f"Using device: {device}")
# print(f"\nModel Reply:\n{decoded[0].split('[/INST]')[1]}")

'''
Complete I/O:
<|im_start|> user
What is a Large Language Model? Elaborate.
<|im_end|> 
A Large Language Model is a type of artificial intelligence algorithm
designed to generate human-like text or respond to natural language input.
It is typically trained on vast amounts of text data, enabling it to
understand and generate language with a high level of complexity.<|im_end|>

'''

```
# Loss

| Step | Training Loss   |
|-----|---------|
| 1   | 0.693300|
| 2   | 0.693200|
| 3   | 0.692500|
| 4   | 0.691300|
| 5   | 0.68940 |
| ... | ...     |
| 45  | 0.633700|
| 46  | 0.629000|
| 47  | 0.591300|
| 48  | 0.558100|
| 49  | 0.585800|
| 50  | 0.558900|

# Hyperparameters:

All hyperparameters are as [here](https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html) except the following
```python

# for TrainingArguments()
dataloader_num_workers=1, # had to add this #CHANGED_HERE#
dataloader_prefetch_factor=1

# for DPOTrainer()
# ref_model (it is not required as prompted by error when I included a reference model: not sure why tho, needs further investigation) 
max_prompt_length=256, # had to lower this to 256 #CHANGED_HERE# or else cuda out of memory
max_length=256, # had to lower this to 256 #CHANGED_HERE# cuda out of memory
```

# Reference

Thanks! https://mlabonne.github.io/blog/posts/Fine_tune_Mistral_7b_with_DPO.html