
sometimesanotion PRO
AI & ML interests
Recent Activity
Organizations
sometimesanotion's activity
Fusion vs. SLERP?


You need to keep testing models in pytorch, not just GGUF, to catch this bug. If you submit it for evaluation on the open leaderboard, it will abort.
For those who need a bit of Python to test their merged models:
import os
from typing import List
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
def main(checkpoint: str) -> None:
"""Load and return tokenizers and models for specified checkpoints."""
tokenizers = [AutoTokenizer.from_pretrained(checkpoint)]
print(f"Loaded tokenizer from {checkpoint}")
models = [
AutoModelForCausalLM.from_pretrained(
checkpoint, device_map="auto", torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu")
]
for model in models:
print(f"Loaded model to {model.device}")
def cli():
"""CLI entry point."""
import argparse
parser = argparse.ArgumentParser(description='Load a tokenizer and model from a given checkpoint.')
parser.add_argument('checkpoint', type=str, help='The pre-trained checkpoint name or path')
args = parser.parse_args()
main(args.checkpoint)
if __name__ == "__main__":
cli()

If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you:
ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.
This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.
These work:
models:
- model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
- model: sthenno-com/miscii-14b-0218
slices:
- sources:
- { layer_range: [ 0, 2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
- sources:
- { layer_range: [ 2, 6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
This does not:
slices:
- sources:
- { layer_range: [ 0, 2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
- { layer_range: [ 0, 2 ], model: sthenno-com/miscii-14b-0218 }
- sources:
- { layer_range: [ 2, 6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
- { layer_range: [ 2, 6 ], model: sthenno-com/miscii-14b-0218 }
@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!

microsoft/Phi-4-mini-instruct
YOYO-AI/Qwen2.5-14B-YOYO-V4-p2

Lunzima/NQLSG-Qwen2.5-14B-OriginalFusion

Lunzima/NQLSG-Qwen2.5-14B-MegaFusion-v8.7

wanlige/li-14b-v0.4-slerp0.1
CultriX/Qwen2.5-14B-GeneralReasoning

The numbers are in! The results are fascinating.
Though IFEVAL skewed low compared to the ancestor model's average, and Lamarckvergence's improved MATH didn't come through, this model is strong in several ways. The GPQA score suggests as much. These are scores I'm pretty sure I can improve without giving up much of the interesting gains.
What's more, my subjective impression is that its prose and consistency get a boost from Chocolatine. @jpacifico , I think arcee_fusion is a merge method that has a lot to offer for your future base models! This also bodes very well for the next several merges to come.
I think what you're doing here is really helpful
