File size: 5,578 Bytes
c6dadda
 
a9003c9
 
 
 
 
 
 
c6dadda
a9003c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fd21a0
a9003c9
 
 
 
5fd21a0
 
 
 
 
 
 
 
 
 
 
669de4e
 
a9003c9
 
 
 
 
cd51908
 
 
a9003c9
 
 
 
 
 
 
 
 
 
 
 
 
 
669de4e
a9003c9
669de4e
a9003c9
 
 
 
669de4e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: mit
language:
- en
tags:
- Human Preference Alignment
- large language models
datasets:
- openbmb/UltraFeedback
---

# Introduction
MetaAligner-UltraFeedback-1.1B is part of the <em>MetaAligner</em> project, the first policy-agnostic and generalizable method for multi-objective preference alignment of large
language models. This model is finetuned based on the TinyLLaMA-1.1B foundation model and 
the dynamic multi-objective dataset built from the openbmb/UltraFeedback dataset. UltraFeedback-MetaAligner is trained to align responses of another general AI assistant considering
a single-turn query, but the queries include professional questions such as programming language and
history, and the aligned responses are usually more complicated.
The model is expected to perform multi-objective alignment
efficiently, without tuning the policy models or accessing their parameters. <em>MetaAligner</em> also exerts zero-shot preference alignment
for unseen objectives. To our knowledge, this work marks the first attempt at generalizable multi-
objective preference alignment. Experimental results show that MetaAligner can simultaneously perform effective alignment for multiple unseen objectives
while maintaining performance on aligned objectives.

# Dataset
This model is trained based on the following released dataset: 

# Usage
With the Hugging Face Transformers library, you can use the MetaAligner-UltraFeedback-1.1B model in your Python project. Here is a simple example of how to load the model:

```python
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained('MetaAligner/MetaAligner-UltraFeedback-1.1B', padding_side='left')
model = LlamaForCausalLM.from_pretrained('MetaAligner/MetaAligner-UltraFeedback-1.1B', device_map='auto', torch_dtype=torch.bfloat16)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
```

In this example, LlamaTokenizer is used to load the tokenizer, and LlamaForCausalLM is used to load the model. The `device_map='auto'` argument is used to automatically
use the GPU if it's available.

You can align any outputs obtained from other foundation models with your self-defined objectives. For example, we have the following query from UltraFeedback, 
unaligned output examples from LLaMA2-7B,
and target objectives:

```python
all_aspects = {'instruction_following': 'Instruction following: the response should follow the instructions of the query',
               'honesty': 'Honesty: the response should not tell lies',
               'truthfulness': 'Truthfulness: the response should actively making known all the full truth of a matter',
               'helpfulness': 'Helpfulness: the response should provide useful resources and suggestions to the user',
               'speci': 'Specificity: the response should refer to facts and details and avoid vague arguments.',
               'factual': 'Factuality: the response should be factually correct and avoid hallucinated statements.',
               'read': 'Readability: the response should be easy to read and understand, not too technical for laymen.',
               'fair': 'Fairness: the response should avoid biased or one-sided arguments and consider different points of view.',
               'repeat': 'Repetition: the response should avoid repetitive statements of one point.',
               'len': 'Length: the response should be concise and avoid redundant content.'}
aspects = [instruction_following', 'honesty', 'truthfulness', 'helpfulness']
question = "What is the square root of banana?"
answer = "The square root of banana is not a well-defined mathematical concept, as bananas are not objects that can be squared. Bananas are a type of fruit that grow on plants and have a specific shape and size. They do not have any mathematical properties or values that can be used in mathematical operations. Therefore, the square root of banana is not a valid or meaningful mathematical concept. It is important to be aware of the limitations of mathematical language and symbols, and to use them correctly and consistently in order to avoid confusion or misinterpretation."
```

To ensure the best performance, use the following template to prompt <em>MetaAligner</em>:

```python
query_prompt = 'You are an assistant to human. You will be provided with a query and an answer. Consider the query, ' \
               'then edit the answer to improve it considering these aspects: {aspects} | ' \
             'Query: {question} | Answer: {answer} | Edit: '
aspects = [all_aspects[i] for i in aspects]
aligner_queries = [query_prompt.format(aspects='; '.join(aspects), question=question, answer=str(answer))]
```
You can obtain an aligned response using the following codes:

```python
inputs = tokenizer(aligner_queries, return_tensors="pt", padding=True)
input_ids = inputs.input_ids.to(device)
generate_ids = model.generate(input_ids, max_new_tokens=1024)
truc_ids = generate_ids[0][len(input_ids[0]):]
response = tokenizer.decode(truc_ids, skip_special_tokens=True, spaces_between_special_tokens=False)
print(response)
```

One inference of MetaAligner-UltraFeedback-1.1B on the above codes has the following response:
```
The square root of a number is the reciprocal of that number. In this case, the square root of a banana is not a valid mathematical concept. Bananas are not a mathematical quantity, and therefore, there is no square root of a banana.
```

## License

MetaAligner-UltraFeedback-1.1B is licensed under MIT. For more details, please see the MIT file.