45 4 155

sometimesanotion PRO

sometimesanotion

https://ko-fi.com/sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

new activity 5 days ago

wanlige/li-14b-v0.4-slerp0.1:Fusion vs. SLERP?

updated a model 5 days ago

sometimesanotion/Lamarck-14B-v0.7

replied to their post 6 days ago

I have tracked down a blocker preventing Lamarck releases to a della_linear bug in newer mergekit versions. If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you: ``` ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect. ``` This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits. These work: ```yaml models: - model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 - model: sthenno-com/miscii-14b-0218 ``` ```yaml slices: - sources: - { layer_range: [ 0, 2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 } - sources: - { layer_range: [ 2, 6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 } ``` This does not: ```yaml slices: - sources: - { layer_range: [ 0, 2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 } - { layer_range: [ 0, 2 ], model: sthenno-com/miscii-14b-0218 } - sources: - { layer_range: [ 2, 6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 } - { layer_range: [ 2, 6 ], model: sthenno-com/miscii-14b-0218 } ``` @Crystalcareai, do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!

View all activity

Organizations

sometimesanotion's activity

New activity in wanlige/li-14b-v0.4-slerp0.1 5 days ago

Fusion vs. SLERP?

#2 opened 9 days ago by

sometimesanotion

updated a model 5 days ago

sometimesanotion/Lamarck-14B-v0.7

Text Generation • Updated 5 days ago • 7.29k • 37

replied to their post 6 days ago

You need to keep testing models in pytorch, not just GGUF, to catch this bug. If you submit it for evaluation on the open leaderboard, it will abort.

For those who need a bit of Python to test their merged models:

import os
from typing import List

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def main(checkpoint: str) -> None:
    """Load and return tokenizers and models for specified checkpoints."""
    
    tokenizers = [AutoTokenizer.from_pretrained(checkpoint)]
    print(f"Loaded tokenizer from {checkpoint}")
   
    models = [
        AutoModelForCausalLM.from_pretrained(
            checkpoint, device_map="auto", torch_dtype=torch.bfloat16
        ).to("cuda" if torch.cuda.is_available() else "cpu")
    ]
    
    for model in models:
        print(f"Loaded model to {model.device}")

def cli():
    """CLI entry point."""
    import argparse
    
    parser = argparse.ArgumentParser(description='Load a tokenizer and model from a given checkpoint.')
    parser.add_argument('checkpoint', type=str, help='The pre-trained checkpoint name or path')
    
    args = parser.parse_args()
    
    main(args.checkpoint)

if __name__ == "__main__":
    cli()

liked a model 7 days ago

Lunzima/NQLSG-Qwen2.5-14B-MegaFusion-v8

Text Generation • Updated about 21 hours ago • 187 • 2

posted an update 7 days ago

Post

2216

ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.

This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.

These work:

models:
  - model:           sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - model:           sthenno-com/miscii-14b-0218

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }

This does not:

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  0,  2 ], model: sthenno-com/miscii-14b-0218 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  2,  6 ], model: sthenno-com/miscii-14b-0218 }

@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!

1 reply

published a model 8 days ago

sometimesanotion/Qwentessential-14B-v3

Text Generation • Updated 8 days ago • 34 • 3

updated a model 8 days ago

sometimesanotion/Qwentessential-14B-v3

Text Generation • Updated 8 days ago • 34 • 3

liked a model 8 days ago

TimeLordRaps/DS-R1-Lamarckvergence-14B-1M-test3

Text Generation • Updated 9 days ago • 9 • 1

liked 6 models 9 days ago

liked a model 11 days ago

wanlige/li-14b-v0.4

Text Generation • Updated 10 days ago • 1.17k • 14

updated a model 12 days ago

sometimesanotion/LamarckInfusion-14B-v1

Text Generation • Updated 12 days ago • 216 • 5

published a model 12 days ago

sometimesanotion/LamarckInfusion-14B-v1

Text Generation • Updated 12 days ago • 216 • 5

replied to their post 12 days ago

The numbers are in! The results are fascinating.

Though IFEVAL skewed low compared to the ancestor model's average, and Lamarckvergence's improved MATH didn't come through, this model is strong in several ways. The GPQA score suggests as much. These are scores I'm pretty sure I can improve without giving up much of the interesting gains.

What's more, my subjective impression is that its prose and consistency get a boost from Chocolatine. @jpacifico , I think arcee_fusion is a merge method that has a lot to offer for your future base models! This also bodes very well for the next several merges to come.

liked a model 13 days ago

CultriX/Qwen2.5-14B-ReasoningMerge

Text Generation • Updated 20 days ago • 269 • 3

New activity in djuna/TEST-Q2.5-Lenned-14B 13 days ago

I think what you're doing here is really helpful

#2 opened 13 days ago by

sometimesanotion