These seem to work with finetunes too

#3
by gghfez - opened

This surprised me. I used the Mistral-Nemo vectors with Lumimaid-12b and it works exactly as expected.

Not sure if that's the case for all finetunes, but it's worth a try to save having to train new ones.

This surprised me. I used the Mistral-Nemo vectors with Lumimaid-12b and it works exactly as expected.

Not sure if that's the case for all finetunes, but it's worth a try to save having to train new ones.

Yeah, it's likely they will work but I'm just going through a few of the other creative writing models (mainly 70b) now as will likely work slightly better than using the base model's control vectors.

Wow, the downloads are growing exponentially (but the feedback is still constant - just you still).

Are people actually using these control vectors or is it just bots slurping down everything that gets posted on HF? :/

I've actually uploaded the last sub-70b model I have in the training queue now.

I have a couple more llama-3/3.1:70b and qwen-2:72b fine-tuned to go.

Then I am going to do some of the better llama-2:70b (extended-context first) models - these are a hassle as nearly all are missing the "chat_template" field and require hunting around and/or custom Jinga2 templates writing to work with HF transformers tokeniser, etc. Some are very good still though and I think even hold up to a lot of the newer fine-tunes due to their prose (and less GPT-isms).

Oh wow thanks, you've been doing a lot of these!

Oh wow thanks, you've been doing a lot of these!

Yeah, I just leave it running and have quite a lot of models saved.

That's the last of the 70B sized models now.

I don't think there is much point in doing all the old 4k context llama-2:70b fine-tunes (I did aurelian-v0.1:70b and aurelian-v0.5:70b as they are technically 32k context and quite interesting).

So now just have 2x mistral-large:123b and 1x mixtral:8x22b fine-tunes to do, and then I'm done until new models or fine-tunes come out (and get found to be at least semi-decent at writing).

This guy is releasing creative writing finetunes you might be interested in. I haven't had a chance to test them extensively yet

https://huggingface.co./ArliAI

Here's his explanation of what he's doing differently:

https://old.reddit.com/r/ArliAI/comments/1fd4maa/the_arli_ai_rpmax_v11_series_of_models_38b_8b_12b/

Might be more Roleplay focused but he's trying to fight the war on slop

This guy is releasing creative writing finetunes you might be interested in. I haven't had a chance to test them extensively yet

https://huggingface.co./ArliAI

Here's his explanation of what he's doing differently:

https://old.reddit.com/r/ArliAI/comments/1fd4maa/the_arli_ai_rpmax_v11_series_of_models_38b_8b_12b/

Might be more Roleplay focused but he's trying to fight the war on slop

I already saw this and replied in his HF repo - his data curating method sounds good, but the optimisation method isn't gonna help at all.

@gghfez did you manage to get the 70b model working OK?

Ran out of disk space quantizing it overnight lol. I'll test it once it's done.

Looks like disk space was a red herring. Something seems wrong with the weights, I couldn't exl2 quantize it

Looks like disk space was a red herring. Something seems wrong with the weights, I couldn't exl2 quantize it

Yeah, somebody else mentioned it didn't like converting to exl2.

Yeah, I just tried again with a fresh download, exllamv2 build and different machine, failed at the same part :(

-- model.layers.14.self_attn 4.1490 bpw - exp. error: 0.00571387
-- model.layers.14.mlp 2.9043 bpw - exp. error: 0.01260124
-- model.layers.15.self_attn 2.1243 bpw - exp. error: 0.00000000

exllamav2/exllamav2/conversion/optimize.py", line 167, in optimize
logerr += math.log(err)
^^^^^^^^^^^^^
ValueError: math domain error

Not sure if that would cause an issue for your control vector training. GGUF converts and quantized okay (haven't tested inference on it yet though).
Might be worth excluding that layer I guess.

Yeah, I just tried again with a fresh download, exllamv2 build and different machine, failed at the same part :(

-- model.layers.14.self_attn 4.1490 bpw - exp. error: 0.00571387
-- model.layers.14.mlp 2.9043 bpw - exp. error: 0.01260124
-- model.layers.15.self_attn 2.1243 bpw - exp. error: 0.00000000

exllamav2/exllamav2/conversion/optimize.py", line 167, in optimize
logerr += math.log(err)
^^^^^^^^^^^^^
ValueError: math domain error

Not sure if that would cause an issue for your control vector training. GGUF converts and quantized okay (haven't tested inference on it yet though).
Might be worth excluding that layer I guess.

I've downloaded it but still have 1 x mistral-large:123b finetune and 1x mixtral:8x22b finetune to go for the control vectors, so will wait and see.

I wonder if that whole layer has got one of the matrices all set to zero or something - that would explain why you could quantize it perfectly?

The magnum-v2-123b model is the 3rd weird "overcooked" on descriptive storytelling model I've had now:

Loading pre/post prompt stems from 'data/prompt_stems.json'... Done (50 + 50 loaded).
Loading prompt continuations from 'data/writing_style_continuations/storytelling.json'... Done (3 classes; each with 10 continuations loaded).
Loading writing prompts from 'data/writing_prompts.txt'... Done (11835 loaded).
Generating dataset samples... Done ([3 classes x 4096 prompts] 12288 generated).
Loading '/mnt/data/magnum-v2-123b' model and tokenizer...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51/51 [01:09<00:00,  1.36s/it]
Tokenizing prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12288/12288 [00:05<00:00, 2196.94it/s]
Sampling hidden states: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12288/12288 [3:56:42<00:00,  1.16s/it]
Saving to 'magnum-v2:123b-storytelling__hidden_state_samples.pt'... Done.
Testing Eigenvector Directions for layers 1 to 87:
- Layer 1: [1/12288 filtered] [1/12288 selected] Δ = 176%, Δσ² = 63.8%, σ= (0.002, 0.001), μ = (-0.002, 0.002 [51.9%]) -->  μ' = (0.000, -0.002, 0.002)
- Layer 2: [1/12288 filtered] [1/12288 selected] Δ = 189%, Δσ² = 65.4%, σ= (0.001, 0.001), μ = (-0.002, 0.002 [50.8%]) -->  μ' = (0.000, -0.002, 0.002)
- Layer 3: [1/12288 filtered] [1/12288 selected] Δ = 185%, Δσ² = 64.9%, σ= (0.001, 0.001), μ = (-0.001, 0.001 [50.4%]) -->  μ' = (0.000, -0.001, 0.001)
- Layer 4: [1/12288 filtered] [1/12288 selected] Δ = 190%, Δσ² = 65.5%, σ= (0.001, 0.001), μ = (-0.001, 0.001 [51.8%]) -->  μ' = (0.000, -0.001, 0.001)
- Layer 5: [1/12288 filtered] [1/12288 selected] Δ = 213%, Δσ² = 68.0%, σ= (0.001, 0.001), μ = (-0.001, 0.001 [51.3%]) -->  μ' = (0.000, -0.001, 0.001)
- Layer 6: [1/12288 filtered] [1/12288 selected] Δ = 276%, Δσ² = 73.4%, σ= (0.001, 0.001), μ = (-0.002, 0.002 [51.0%]) -->  μ' = (0.000, -0.002, 0.002)
- Layer 7: [1/12288 filtered] [1/12288 selected] Δ = 555%, Δσ² = 84.7%, σ= (0.002, 0.001), μ = (-0.003, 0.003 [49.4%]) -->  μ' = (-0.000, -0.003, 0.003)
- Layer 8: [1/12288 filtered] [1/12288 selected] Δ = 349%, Δσ² = 77.7%, σ= (0.002, 0.002), μ = (-0.004, 0.004 [50.6%]) -->  μ' = (0.000, -0.004, 0.004)
- Layer 9: [1/12288 filtered] [1/12288 selected] Δ = 496%, Δσ² = 83.2%, σ= (0.002, 0.002), μ = (-0.003, 0.003 [50.7%]) -->  μ' = (0.000, -0.003, 0.003)
- Layer 10: [1/12288 filtered] [1/12288 selected] Δ = 501%, Δσ² = 83.4%, σ= (0.003, 0.002), μ = (-0.006, 0.006 [49.8%]) -->  μ' = (-0.000, -0.006, 0.006)
- Layer 11: [1/12288 filtered] [1/12288 selected] Δ = 602%, Δσ² = 85.8%, σ= (0.003, 0.003), μ = (-0.008, 0.008 [50.0%]) -->  μ' = (0.000, -0.008, 0.008)
- Layer 12: [1/12288 filtered] [1/12288 selected] Δ = 578%, Δσ² = 85.2%, σ= (0.002, 0.003), μ = (-0.006, 0.006 [50.2%]) -->  μ' = (0.000, -0.006, 0.006)
- Layer 13: [1/12288 filtered] [1/12288 selected] Δ = 635%, Δσ² = 86.4%, σ= (0.010, 0.006), μ = (-0.021, 0.020 [48.6%]) -->  μ' = (-0.001, -0.021, 0.021)
- Layer 14: [1/12288 filtered] [1/12288 selected] Δ = 505%, Δσ² = 83.5%, σ= (0.005, 0.004), μ = (-0.010, 0.010 [48.8%]) -->  μ' = (-0.000, -0.010, 0.010)
- Layer 15: [1/12288 filtered] [1/12288 selected] Δ = 642%, Δσ² = 86.5%, σ= (0.005, 0.004), μ = (-0.012, 0.012 [49.1%]) -->  μ' = (-0.000, -0.012, 0.012)
- Layer 16: [1/12288 filtered] [1/12288 selected] Δ = 553%, Δσ² = 84.7%, σ= (0.007, 0.005), μ = (-0.015, 0.015 [49.6%]) -->  μ' = (-0.000, -0.015, 0.015)
- Layer 17: [1/12288 filtered] [1/12288 selected] Δ = 444%, Δσ² = 81.6%, σ= (0.006, 0.006), μ = (-0.013, 0.012 [49.2%]) -->  μ' = (-0.000, -0.013, 0.013)
- Layer 18: [1/12288 filtered] [1/12288 selected] Δ = 592%, Δσ² = 85.5%, σ= (0.007, 0.007), μ = (-0.017, 0.017 [49.4%]) -->  μ' = (-0.000, -0.017, 0.017)
- Layer 19: [1/12288 filtered] [1/12288 selected] Δ = 352%, Δσ² = 77.9%, σ= (0.010, 0.009), μ = (-0.018, 0.018 [49.8%]) -->  μ' = (-0.000, -0.018, 0.018)
- Layer 20: [1/12288 filtered] [1/12288 selected] Δ = 284%, Δσ² = 73.9%, σ= (0.010, 0.010), μ = (-0.017, 0.017 [50.5%]) -->  μ' = (0.000, -0.017, 0.017)
- Layer 21: [1/12288 filtered] [1/12288 selected] Δ = 492%, Δσ² = 83.1%, σ= (0.018, 0.010), μ = (-0.034, 0.032 [48.6%]) -->  μ' = (-0.001, -0.033, 0.033)
- Layer 22: [1/12288 filtered] [1/12288 selected] Δ = 456%, Δσ² = 82.0%, σ= (0.032, 0.015), μ = (-0.057, 0.050 [46.9%]) -->  μ' = (-0.003, -0.054, 0.054)
- Layer 23: [1/12288 filtered] [1/12288 selected] Δ = 525%, Δσ² = 84.0%, σ= (0.024, 0.015), μ = (-0.047, 0.045 [48.8%]) -->  μ' = (-0.001, -0.046, 0.046)
- Layer 24: [1/12288 filtered] [1/12288 selected] Δ = 496%, Δσ² = 83.2%, σ= (0.017, 0.012), μ = (-0.034, 0.032 [48.2%]) -->  μ' = (-0.001, -0.033, 0.033)
- Layer 25: [1/12288 filtered] [1/12288 selected] Δ = 548%, Δσ² = 84.6%, σ= (0.016, 0.012), μ = (-0.034, 0.031 [47.7%]) -->  μ' = (-0.001, -0.032, 0.032)
- Layer 26: [1/12288 filtered] [1/12288 selected] Δ = 580%, Δσ² = 85.3%, σ= (0.021, 0.015), μ = (-0.045, 0.042 [48.3%]) -->  μ' = (-0.001, -0.043, 0.043)
- Layer 27: [1/12288 filtered] [1/12288 selected] Δ = 528%, Δσ² = 84.1%, σ= (0.020, 0.013), μ = (-0.041, 0.037 [47.8%]) -->  μ' = (-0.002, -0.039, 0.039)
- Layer 28: [1/12288 filtered] [1/12288 selected] Δ = 507%, Δσ² = 83.5%, σ= (0.020, 0.016), μ = (-0.042, 0.038 [47.5%]) -->  μ' = (-0.002, -0.040, 0.040)
- Layer 29: [1/12288 filtered] [1/12288 selected] Δ = 390%, Δσ² = 79.6%, σ= (0.022, 0.018), μ = (-0.041, 0.039 [48.5%]) -->  μ' = (-0.001, -0.040, 0.040)
- Layer 30: [1/12288 filtered] [1/12288 selected] Δ = 434%, Δσ² = 81.3%, σ= (0.021, 0.020), μ = (-0.045, 0.041 [47.9%]) -->  μ' = (-0.002, -0.043, 0.043)
- Layer 31: [1/12288 filtered] [1/12288 selected] Δ = 490%, Δσ² = 83.1%, σ= (0.038, 0.019), μ = (-0.072, 0.060 [45.5%]) -->  μ' = (-0.006, -0.066, 0.066)
- Layer 32: [1/12288 filtered] [1/12288 selected] Δ = 396%, Δσ² = 79.8%, σ= (0.044, 0.029), μ = (-0.081, 0.068 [45.7%]) -->  μ' = (-0.006, -0.075, 0.075)
- Layer 33: [1/12288 filtered] [1/12288 selected] Δ = 393%, Δσ² = 79.7%, σ= (0.057, 0.026), μ = (-0.098, 0.077 [44.1%]) -->  μ' = (-0.010, -0.088, 0.088)
- Layer 34: [2/12288 filtered] [1/12288 selected] Δ = 434%, Δσ² = 81.3%, σ= (0.131, 0.067), μ = (-0.256, 0.176 [40.8%]) -->  μ' = (-0.040, -0.216, 0.216)
- Layer 35: [2/12288 filtered] [1/12288 selected] Δ = 461%, Δσ² = 82.2%, σ= (0.100, 0.058), μ = (-0.206, 0.145 [41.3%]) -->  μ' = (-0.031, -0.176, 0.176)
- Layer 36: [2/12288 filtered] [1/12288 selected] Δ = 590%, Δσ² = 85.5%, σ= (0.091, 0.045), μ = (-0.208, 0.141 [40.4%]) -->  μ' = (-0.034, -0.174, 0.174)
- Layer 37: [1/12288 filtered] [1/12288 selected] Δ = 570%, Δσ² = 85.1%, σ= (0.086, 0.049), μ = (-0.191, 0.142 [42.6%]) -->  μ' = (-0.025, -0.166, 0.166)
- Layer 38: [2/12288 filtered] [1/12288 selected] Δ = 672%, Δσ² = 87.0%, σ= (0.259, 0.211), μ = (-0.728, 0.497 [40.6%]) -->  μ' = (-0.116, -0.613, 0.613)
- Layer 39: [1/12288 filtered] [1/12288 selected] Δ = 738%, Δσ² = 88.1%, σ= (0.129, 0.095), μ = (-0.358, 0.257 [41.8%]) -->  μ' = (-0.050, -0.307, 0.307)
- Layer 40: [1/12288 filtered] [1/12288 selected] Δ = 796%, Δσ² = 88.8%, σ= (0.120, 0.081), μ = (-0.332, 0.245 [42.4%]) -->  μ' = (-0.044, -0.288, 0.288)
- Layer 41: [2/12288 filtered] [1/12288 selected] Δ = 775%, Δσ² = 88.6%, σ= (0.173, 0.118), μ = (-0.493, 0.334 [40.4%]) -->  μ' = (-0.080, -0.413, 0.413)
- Layer 42: [2/12288 filtered] [1/12288 selected] Δ = 699%, Δσ² = 87.5%, σ= (0.195, 0.122), μ = (-0.543, 0.317 [36.9%]) -->  μ' = (-0.113, -0.430, 0.430)
- Layer 43: [2/12288 filtered] [1/12288 selected] Δ = 858%, Δσ² = 89.6%, σ= (0.148, 0.105), μ = (-0.453, 0.298 [39.6%]) -->  μ' = (-0.078, -0.375, 0.375)
- Layer 44: [1/12288 filtered] [1/12288 selected] Δ = 670%, Δσ² = 87.0%, σ= (0.145, 0.099), μ = (-0.380, 0.263 [40.9%]) -->  μ' = (-0.058, -0.322, 0.322)
- Layer 45: [1/12288 filtered] [1/12288 selected] Δ = 473%, Δσ² = 82.5%, σ= (0.157, 0.106), μ = (-0.346, 0.235 [40.4%]) -->  μ' = (-0.056, -0.291, 0.291)
- Layer 46: [1/12288 filtered] [1/12288 selected] Δ = 572%, Δσ² = 85.1%, σ= (0.123, 0.085), μ = (-0.298, 0.207 [40.9%]) -->  μ' = (-0.046, -0.253, 0.253)
- Layer 47: [1/12288 filtered] [1/12288 selected] Δ = 478%, Δσ² = 82.7%, σ= (0.137, 0.117), μ = (-0.325, 0.232 [41.7%]) -->  μ' = (-0.046, -0.279, 0.279)
- Layer 48: [1/12288 filtered] [1/12288 selected] Δ = 524%, Δσ² = 84.0%, σ= (0.140, 0.113), μ = (-0.340, 0.242 [41.6%]) -->  μ' = (-0.049, -0.291, 0.291)
- Layer 49: [1/12288 filtered] [1/12288 selected] Δ = 535%, Δσ² = 84.3%, σ= (0.117, 0.090), μ = (-0.283, 0.200 [41.5%]) -->  μ' = (-0.041, -0.241, 0.241)
- Layer 50: [1/12288 filtered] [1/12288 selected] Δ = 593%, Δσ² = 85.6%, σ= (0.114, 0.081), μ = (-0.283, 0.198 [41.1%]) -->  μ' = (-0.043, -0.240, 0.240)
- Layer 51: [2/12288 filtered] [1/12288 selected] Δ = 585%, Δσ² = 85.4%, σ= (0.108, 0.076), μ = (-0.270, 0.182 [40.3%]) -->  μ' = (-0.044, -0.226, 0.226)
- Layer 52: [2/12288 filtered] [1/12288 selected] Δ = 529%, Δσ² = 84.1%, σ= (0.132, 0.087), μ = (-0.313, 0.200 [39.0%]) -->  μ' = (-0.056, -0.257, 0.257)
- Layer 53: [1/12288 filtered] [1/12288 selected] Δ = 548%, Δσ² = 84.6%, σ= (0.110, 0.080), μ = (-0.262, 0.187 [41.7%]) -->  μ' = (-0.037, -0.224, 0.224)
- Layer 54: [1/12288 filtered] [1/12288 selected] Δ = 529%, Δσ² = 84.1%, σ= (0.110, 0.084), μ = (-0.267, 0.185 [41.0%]) -->  μ' = (-0.041, -0.226, 0.226)
- Layer 55: [2/12288 filtered] [1/12288 selected] Δ = 552%, Δσ² = 84.7%, σ= (0.117, 0.081), μ = (-0.285, 0.187 [39.7%]) -->  μ' = (-0.049, -0.236, 0.236)
- Layer 56: [1/12288 filtered] [1/12288 selected] Δ = 517%, Δσ² = 83.8%, σ= (0.129, 0.101), μ = (-0.318, 0.210 [39.8%]) -->  μ' = (-0.054, -0.264, 0.264)
- Layer 57: [1/12288 filtered] [1/12288 selected] Δ = 506%, Δσ² = 83.5%, σ= (0.125, 0.107), μ = (-0.314, 0.209 [40.0%]) -->  μ' = (-0.053, -0.262, 0.262)
- Layer 58: [1/12288 filtered] [1/12288 selected] Δ = 473%, Δσ² = 82.5%, σ= (0.138, 0.108), μ = (-0.325, 0.213 [39.5%]) -->  μ' = (-0.056, -0.269, 0.269)
- Layer 59: [2/12288 filtered] [1/12288 selected] Δ = 510%, Δσ² = 83.6%, σ= (0.119, 0.094), μ = (-0.291, 0.193 [39.9%]) -->  μ' = (-0.049, -0.242, 0.242)
- Layer 60: [1/12288 filtered] [1/12288 selected] Δ = 459%, Δσ² = 82.1%, σ= (0.127, 0.114), μ = (-0.314, 0.202 [39.2%]) -->  μ' = (-0.056, -0.258, 0.258)
- Layer 61: [1/12288 filtered] [1/12288 selected] Δ = 484%, Δσ² = 82.9%, σ= (0.131, 0.111), μ = (-0.329, 0.207 [38.6%]) -->  μ' = (-0.061, -0.268, 0.268)
- Layer 62: [1/12288 filtered] [1/12288 selected] Δ = 453%, Δσ² = 81.9%, σ= (0.133, 0.126), μ = (-0.333, 0.218 [39.6%]) -->  μ' = (-0.057, -0.276, 0.276)
- Layer 63: [1/12288 filtered] [1/12288 selected] Δ = 349%, Δσ² = 77.7%, σ= (0.144, 0.154), μ = (-0.327, 0.231 [41.4%]) -->  μ' = (-0.048, -0.279, 0.279)
- Layer 64: [1/12288 filtered] [1/12288 selected] Δ = 485%, Δσ² = 82.9%, σ= (0.144, 0.116), μ = (-0.352, 0.224 [38.9%]) -->  μ' = (-0.064, -0.288, 0.288)
- Layer 65: [1/12288 filtered] [1/12288 selected] Δ = 368%, Δσ² = 78.6%, σ= (0.147, 0.160), μ = (-0.343, 0.246 [41.7%]) -->  μ' = (-0.049, -0.295, 0.295)
- Layer 66: [2/12288 filtered] [1/12288 selected] Δ = 388%, Δσ² = 79.5%, σ= (0.141, 0.141), μ = (-0.337, 0.217 [39.2%]) -->  μ' = (-0.060, -0.277, 0.277)
- Layer 67: [1/12288 filtered] [1/12288 selected] Δ = 471%, Δσ² = 82.5%, σ= (0.128, 0.114), μ = (-0.304, 0.223 [42.3%]) -->  μ' = (-0.040, -0.263, 0.263)
- Layer 68: [1/12288 filtered] [1/12288 selected] Δ = 465%, Δσ² = 82.3%, σ= (0.122, 0.112), μ = (-0.296, 0.209 [41.3%]) -->  μ' = (-0.044, -0.252, 0.252)
- Layer 69: [1/12288 filtered] [1/12288 selected] Δ = 479%, Δσ² = 82.7%, σ= (0.122, 0.117), μ = (-0.304, 0.220 [42.0%]) -->  μ' = (-0.042, -0.262, 0.262)
- Layer 70: [1/12288 filtered] [1/12288 selected] Δ = 475%, Δσ² = 82.6%, σ= (0.133, 0.115), μ = (-0.321, 0.219 [40.6%]) -->  μ' = (-0.051, -0.270, 0.270)
- Layer 71: [1/12288 filtered] [1/12288 selected] Δ = 442%, Δσ² = 81.5%, σ= (0.147, 0.126), μ = (-0.339, 0.237 [41.2%]) -->  μ' = (-0.051, -0.288, 0.288)
- Layer 72: [1/12288 filtered] [1/12288 selected] Δ = 442%, Δσ² = 81.6%, σ= (0.147, 0.128), μ = (-0.342, 0.236 [40.8%]) -->  μ' = (-0.053, -0.289, 0.289)
- Layer 73: [1/12288 filtered] [1/12288 selected] Δ = 467%, Δσ² = 82.4%, σ= (0.147, 0.134), μ = (-0.361, 0.247 [40.7%]) -->  μ' = (-0.057, -0.304, 0.304)
- Layer 74: [1/12288 filtered] [1/12288 selected] Δ = 495%, Δσ² = 83.2%, σ= (0.154, 0.131), μ = (-0.383, 0.252 [39.7%]) -->  μ' = (-0.065, -0.318, 0.318)
- Layer 75: [1/12288 filtered] [1/12288 selected] Δ = 400%, Δσ² = 80.0%, σ= (0.182, 0.167), μ = (-0.413, 0.284 [40.7%]) -->  μ' = (-0.065, -0.348, 0.348)
- Layer 76: [1/12288 filtered] [1/12288 selected] Δ = 384%, Δσ² = 79.3%, σ= (0.210, 0.187), μ = (-0.470, 0.310 [39.8%]) -->  μ' = (-0.080, -0.390, 0.390)
- Layer 77: [1/12288 filtered] [1/12288 selected] Δ = 469%, Δσ² = 82.4%, σ= (0.187, 0.166), μ = (-0.460, 0.306 [39.9%]) -->  μ' = (-0.077, -0.383, 0.383)
- Layer 78: [1/12288 filtered] [1/12288 selected] Δ = 418%, Δσ² = 80.7%, σ= (0.227, 0.190), μ = (-0.518, 0.338 [39.5%]) -->  μ' = (-0.090, -0.428, 0.428)
- Layer 79: [1/12288 filtered] [1/12288 selected] Δ = 413%, Δσ² = 80.5%, σ= (0.244, 0.224), μ = (-0.560, 0.392 [41.2%]) -->  μ' = (-0.084, -0.476, 0.476)
- Layer 80: [1/12288 filtered] [1/12288 selected] Δ = 387%, Δσ² = 79.5%, σ= (0.265, 0.223), μ = (-0.580, 0.383 [39.8%]) -->  μ' = (-0.098, -0.481, 0.481)
- Layer 81: [1/12288 filtered] [1/12288 selected] Δ = 426%, Δσ² = 81.0%, σ= (0.274, 0.195), μ = (-0.600, 0.381 [38.9%]) -->  μ' = (-0.109, -0.491, 0.491)
- Layer 82: [1/12288 filtered] [1/12288 selected] Δ = 365%, Δσ² = 78.5%, σ= (0.298, 0.283), μ = (-0.651, 0.458 [41.3%]) -->  μ' = (-0.096, -0.555, 0.555)
- Layer 83: [1/12288 filtered] [1/12288 selected] Δ = 395%, Δσ² = 79.8%, σ= (0.317, 0.286), μ = (-0.717, 0.483 [40.2%]) -->  μ' = (-0.117, -0.600, 0.600)
- Layer 84: [1/12288 filtered] [1/12288 selected] Δ = 362%, Δσ² = 78.3%, σ= (0.382, 0.343), μ = (-0.862, 0.518 [37.5%]) -->  μ' = (-0.172, -0.690, 0.690)
- Layer 85: [1/12288 filtered] [1/12288 selected] Δ = 351%, Δσ² = 77.8%, σ= (0.409, 0.390), μ = (-0.906, 0.591 [39.5%]) -->  μ' = (-0.157, -0.749, 0.749)
- Layer 86: [1/12288 filtered] [1/12288 selected] Δ = 317%, Δσ² = 76.0%, σ= (0.604, 0.593), μ = (-1.298, 0.833 [39.1%]) -->  μ' = (-0.232, -1.065, 1.065)
- Layer 87: [1/12288 filtered] [1/12288 selected] Δ = 334%, Δσ² = 77.0%, σ= (0.860, 0.859), μ = (-1.934, 1.208 [38.4%]) -->  μ' = (-0.363, -1.571, 1.571)

Interested to finally see what this does when I get my GPUs back from all this control-vector creation!

I'm still confused who the heck is downloading these:

Downloads last month
    35,378

That's about 20x more than all the old versions added together - I think your JS tool might have helped "de-confuse" people! :D

"de-confuse" people

lol. would be fun if you could track how many people try to drag the screenshot.

"de-confuse" people

lol. would be fun if you could track how many people try to drag the screenshot.

Hehe, probably 1000s :D

I've uploaded the last few control vectors for now. I'll keep checking for new models that look like they might be useful for creative-writing every week or so and upload the control vectors for any I find.

I've made a deliberate decision to avoid the following (for now):

  • Any non-official fine-tunes that don't really help with creative-writing (eg: the "Tess" fine-tunes) as I don't think there is much use for them.
  • Merged models as they already have their own interesting biases (compared to fine-tuned models) and often it's nearly impossible to find the correct Jinga2 template to use for them.
  • All the miqu-based models (including all my own) as these all seem to revert to "miqu style" after around 6-8k tokens and the control vectors are unlikely to help much here.
  • All the 4k-context llama-2 fine-tunes as again; I'm not sure how much use these are now, and it's also very hard to find the correct Jinga2 template to use for them.

If anybody finds any interesting creative-writing models then please post the links to them here!


I also encourage people to try out the code and experiment with creating their own "axis" rather than just my 8 too - for small and medium models; a single 24GB VRAM GPU should be enough.


I am planning to try to create some "creative-writing critic" control vectors sometime in the next week or two, but these will be limited to just a handful of the largest and smartest models only (assuming I can even get it to work).

jukofyork pinned discussion

Pinning this as not had chance to train any on the recent batch of model.

jukofyork unpinned discussion

Sign up or log in to comment