sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

updated a model 5 days ago
sometimesanotion/Lamarck-14B-v0.7
View all activity

Organizations

Hugging Face Discord Community's profile picture

sometimesanotion's activity

New activity in wanlige/li-14b-v0.4-slerp0.1 5 days ago
replied to their post 6 days ago
view reply

You need to keep testing models in pytorch, not just GGUF, to catch this bug. If you submit it for evaluation on the open leaderboard, it will abort.

For those who need a bit of Python to test their merged models:

import os
from typing import List

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def main(checkpoint: str) -> None:
    """Load and return tokenizers and models for specified checkpoints."""
    
    tokenizers = [AutoTokenizer.from_pretrained(checkpoint)]
    print(f"Loaded tokenizer from {checkpoint}")
   
    models = [
        AutoModelForCausalLM.from_pretrained(
            checkpoint, device_map="auto", torch_dtype=torch.bfloat16
        ).to("cuda" if torch.cuda.is_available() else "cpu")
    ]
    
    for model in models:
        print(f"Loaded model to {model.device}")

def cli():
    """CLI entry point."""
    import argparse
    
    parser = argparse.ArgumentParser(description='Load a tokenizer and model from a given checkpoint.')
    parser.add_argument('checkpoint', type=str, help='The pre-trained checkpoint name or path')
    
    args = parser.parse_args()
    
    main(args.checkpoint)

if __name__ == "__main__":
    cli()
posted an update 7 days ago
view post
Post
2216
I have tracked down a blocker preventing Lamarck releases to a della_linear bug in newer mergekit versions.

If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you:

ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.


This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.

These work:
models:
  - model:           sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - model:           sthenno-com/miscii-14b-0218

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }


This does not:
slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  0,  2 ], model: sthenno-com/miscii-14b-0218 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  2,  6 ], model: sthenno-com/miscii-14b-0218 }


@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!
  • 1 reply
·
replied to their post 12 days ago
view reply

The numbers are in! The results are fascinating.
Screenshot_20250225_044936.webp

Though IFEVAL skewed low compared to the ancestor model's average, and Lamarckvergence's improved MATH didn't come through, this model is strong in several ways. The GPQA score suggests as much. These are scores I'm pretty sure I can improve without giving up much of the interesting gains.

What's more, my subjective impression is that its prose and consistency get a boost from Chocolatine. @jpacifico , I think arcee_fusion is a merge method that has a lot to offer for your future base models! This also bodes very well for the next several merges to come.