Nohobby commited on
Commit
a4279a6
·
verified ·
1 Parent(s): 8e8a39b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -15
README.md CHANGED
@@ -8,29 +8,44 @@ tags:
8
  ---
9
  # merge
10
 
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
 
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the [Linear DELLA](https://arxiv.org/abs/2406.11617) merge method using [Nohobby/ignore_MS3-test-UNHOLY](https://huggingface.co/Nohobby/ignore_MS3-test-UNHOLY) as a base.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * [unsloth/Mistral-Small-24B-Instruct-2501](https://huggingface.co/unsloth/Mistral-Small-24B-Instruct-2501)
22
 
23
- ### Configuration
24
 
25
- The following YAML configuration was used to produce this model:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
 
27
  ```yaml
28
  dtype: bfloat16
29
  tokenizer_source: base
30
  merge_method: della_linear
31
  parameters:
32
  density: 0.55
33
- base_model: Nohobby/ignore_MS3-test-UNHOLY
34
  models:
35
  - model: unsloth/Mistral-Small-24B-Instruct-2501
36
  parameters:
@@ -46,7 +61,7 @@ models:
46
  - filter: down_proj
47
  value: [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
48
  - value: 0
49
- - model: Nohobby/ignore_MS3-test-UNHOLY
50
  parameters:
51
  weight:
52
  - filter: v_proj
@@ -60,4 +75,4 @@ models:
60
  - filter: down_proj
61
  value: [0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
62
  - value: 1
63
- ```
 
8
  ---
9
  # merge
10
 
11
+ I haven't tried the untuned MS3 before messing around with the merge. But I don't think it's all that different from this thing. It's not like there's no influence from the tuned adapters at all, it's just less than I expected. That might be for the better, though. The result is usable as is.
12
 
13
+ Will use this as part of upcoming merges when there is enough fuel.
 
 
 
 
 
 
 
 
14
 
15
+ ## Merge Details
16
 
17
+ ### Step1
18
+ ```yaml
19
+ models:
20
+ - model: unsloth/Mistral-Small-24B-Base-2501
21
+ - model: unsloth/Mistral-Small-24B-Instruct-2501+ToastyPigeon/new-ms-rp-test-ws
22
+ parameters:
23
+ select_topk:
24
+ - value: [0.05, 0.03, 0.02, 0.02, 0.01]
25
+ - model: unsloth/Mistral-Small-24B-Instruct-2501+estrogen/MS2501-24b-Ink-ep2-adpt
26
+ parameters:
27
+ select_topk: 0.1
28
+ - model: trashpanda-org/MS-24B-Instruct-Mullein-v0
29
+ parameters:
30
+ select_topk: 0.4
31
+ base_model: unsloth/Mistral-Small-24B-Base-2501
32
+ merge_method: sce
33
+ parameters:
34
+ int8_mask: true
35
+ rescale: true
36
+ normalize: true
37
+ dtype: bfloat16
38
+ tokenizer_source: base
39
+ ```
40
 
41
+ ### Step2
42
  ```yaml
43
  dtype: bfloat16
44
  tokenizer_source: base
45
  merge_method: della_linear
46
  parameters:
47
  density: 0.55
48
+ base_model: Step1
49
  models:
50
  - model: unsloth/Mistral-Small-24B-Instruct-2501
51
  parameters:
 
61
  - filter: down_proj
62
  value: [1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
63
  - value: 0
64
+ - model: Step1
65
  parameters:
66
  weight:
67
  - filter: v_proj
 
75
  - filter: down_proj
76
  value: [0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1]
77
  - value: 1
78
+ ```