Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sometimesanotion 
posted an update 7 days ago
Post
2216
I have tracked down a blocker preventing Lamarck releases to a della_linear bug in newer mergekit versions.

If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you:

ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.


This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.

These work:
models:
  - model:           sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - model:           sthenno-com/miscii-14b-0218

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }


This does not:
slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  0,  2 ], model: sthenno-com/miscii-14b-0218 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  2,  6 ], model: sthenno-com/miscii-14b-0218 }


@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!

You need to keep testing models in pytorch, not just GGUF, to catch this bug. If you submit it for evaluation on the open leaderboard, it will abort.

For those who need a bit of Python to test their merged models:

import os
from typing import List

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def main(checkpoint: str) -> None:
    """Load and return tokenizers and models for specified checkpoints."""
    
    tokenizers = [AutoTokenizer.from_pretrained(checkpoint)]
    print(f"Loaded tokenizer from {checkpoint}")
   
    models = [
        AutoModelForCausalLM.from_pretrained(
            checkpoint, device_map="auto", torch_dtype=torch.bfloat16
        ).to("cuda" if torch.cuda.is_available() else "cpu")
    ]
    
    for model in models:
        print(f"Loaded model to {model.device}")

def cli():
    """CLI entry point."""
    import argparse
    
    parser = argparse.ArgumentParser(description='Load a tokenizer and model from a given checkpoint.')
    parser.add_argument('checkpoint', type=str, help='The pre-trained checkpoint name or path')
    
    args = parser.parse_args()
    
    main(args.checkpoint)

if __name__ == "__main__":
    cli()