What is this
My experiment. Continuation of Benchmaxxxer series (meme models), but a bit more serious. Performs high on my benchmark and on huggingface benchmark, moderately-high in practice. Worth trying? Yeah. It is on the gooder side.
Observations
- GPTslop: medium-low. Avoid at all costs or it won't stop generating it though.
- Writing style: difficult to describe. Not the usual stuff. A bit of an autopilot like thing, if you write your usual lazy "ahh ahh mistress" it can give you a whole page of good text in return. High.
- Censorship: if you can handle Xwin, you can handle this model. Medium.
- Optimism: medium-low.
- Violence: medium-low.
- Intelligence: medium.
- Creativity: medium-high.
- Doesn't like high temperature. Keep below 1.5.
Prompt format
Vicuna or Alpaca.
Merge Details
This is a merge of pre-trained language models created using mergekit.
This model was merged using the linear merge method.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
models:
- model: spicyboros
parameters:
weight: [0.093732305,0.403220342,0.055438423,0.043830778,0.054189303,0.081136828]
- model: xwin
parameters:
weight: [0.398943486,0.042069007,0.161586088,0.470977297,0.389315704,0.416739102]
- model: euryale
parameters:
weight: [0.061483013,0.079698633,0.043067724,0.00202751,0.132183868,0.36578003]
- model: dolphin
parameters:
weight: [0.427942847,0.391488452,0.442164138,0,0,0.002174793]
- model: wizard
parameters:
weight: [0.017898349,0.083523566,0.297743627,0.175345857,0.071770095,0.134169247]
- model: WinterGoddess
parameters:
weight: [0,0,0,0.30781856,0.352541031,0]
merge_method: linear
dtype: float16
tokenizer_source: base
Benchmarks
NeoEvalPlusN_benchmark
Name | B | C | D | S | P | total | BCD | SP |
---|---|---|---|---|---|---|---|---|
ChuckMcSneed/PMaxxxer-v1-70b | 3 | 1 | 1 | 6.75 | 4.75 | 16.5 | 5 | 11.5 |
ChuckMcSneed/SMaxxxer-v1-70b | 2 | 1 | 0 | 7.25 | 4.25 | 14.5 | 3 | 11.5 |
ChuckMcSneed/ArcaneEntanglement-model64-70b | 3 | 2 | 1 | 7.25 | 6 | 19.25 | 6 | 13.25 |
Absurdly high. That's what happens when you optimize the merges for a benchmark.
Open LLM Leaderboard Evaluation Results
Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
---|---|---|---|---|---|---|---|
ChuckMcSneed/ArcaneEntanglement-model64-70b | 72.79 | 71.42 | 87.96 | 70.83 | 60.53 | 83.03 | 63 |
ChuckMcSneed/PMaxxxer-v1-70b | 72.41 | 71.08 | 87.88 | 70.39 | 59.77 | 82.64 | 62.7 |
ChuckMcSneed/SMaxxxer-v1-70b | 72.23 | 70.65 | 88.02 | 70.55 | 60.7 | 82.87 | 60.58 |
This model is simply superior to my other meme models here.
- Downloads last month
- 413
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.