waldie commited on
Commit
a81df7c
1 Parent(s): fc5ac53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -71
README.md CHANGED
@@ -1,80 +1,13 @@
1
  ---
2
  license: gemma
3
- library_name: transformers
 
4
  pipeline_tag: text-generation
5
- base_model: google/gemma-2-27b-it
6
  tags:
7
  - alignment-handbook
8
  - generated_from_trainer
9
  ---
10
 
11
- # gemma-2-27b-it-SimPO-37K Model Card
12
 
13
- ## Implementation Details
14
- We first followed the [SimPO](https://github.com/princeton-nlp/SimPO) framework to apply [On-Policy Preference Data Generation](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset using the [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) model. We then selected prompts where the chosen reward was at least 0.01 higher than the rejected reward, resulting in 37,040 training data points.
15
-
16
- Model training was conducted using 8x80G A800 GPUs, leveraging the [alignment-handbook](https://github.com/huggingface/alignment-handbook) library. We used `deepspeed_zero_stage3` with optimizer offloading to the CPU. The `SimPOTrainer` arguments were as follows:
17
-
18
- ```bash
19
- # SimPOTrainer arguments
20
- bf16: true
21
- beta: 10
22
- gamma_beta_ratio: 0.5
23
- gradient_accumulation_steps: 8
24
- gradient_checkpointing: true
25
- gradient_checkpointing_kwargs:
26
- use_reentrant: true
27
- hub_model_id: simpo-exps
28
- learning_rate: 8.0e-7
29
- log_level: info
30
- logging_steps: 1
31
- lr_scheduler_type: cosine
32
- max_length: 2048
33
- max_prompt_length: 1800
34
- num_train_epochs: 1
35
- optim: adamw_torch
36
- output_dir: outputs/gemma-2-27b-it-SimPO
37
- run_name: gemma-2-27b-it-SimPO
38
- per_device_train_batch_size: 2
39
- push_to_hub: false
40
- save_strategy: "steps"
41
- save_steps: 100
42
- save_total_limit: 20
43
- seed: 42
44
- warmup_ratio: 0.1
45
- save_only_model: true
46
- ```
47
-
48
- ## Citation
49
-
50
- gemma model:
51
- ```
52
- @article{gemma_2024,
53
- title={Gemma},
54
- url={https://www.kaggle.com/m/3301},
55
- DOI={10.34740/KAGGLE/M/3301},
56
- publisher={Kaggle},
57
- author={Gemma Team},
58
- year={2024}
59
- }
60
- ```
61
-
62
- SimPO paper:
63
- ```
64
- @article{meng2024simpo,
65
- title={{SimPO}: Simple preference optimization with a reference-free reward},
66
- author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
67
- journal={arXiv preprint arXiv:2405.14734},
68
- year={2024}
69
- }
70
- ```
71
-
72
- UltraFeedback paper:
73
- ```
74
- @article{cui2023ultrafeedback,
75
- title={{UltraFeedback}: Boosting language models with high-quality feedback},
76
- author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
77
- journal={arXiv preprint arXiv:2310.01377},
78
- year={2023}
79
- }
80
- ```
 
1
  ---
2
  license: gemma
3
+ base_model: AALF/gemma-2-27b-it-SimPO-37K
4
+ base_model_relation: quantized
5
  pipeline_tag: text-generation
 
6
  tags:
7
  - alignment-handbook
8
  - generated_from_trainer
9
  ---
10
 
11
+ fits into 24gb with 24576 ctx (q4)
12
 
13
+ set rope_alpha to 3.75