sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

updated a model 5 days ago
sometimesanotion/Lamarck-14B-v0.7
View all activity

Organizations

Hugging Face Discord Community's profile picture

Posts 6

view post
Post
2213
I have tracked down a blocker preventing Lamarck releases to a della_linear bug in newer mergekit versions.

If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you:

ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.


This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.

These work:
models:
  - model:           sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - model:           sthenno-com/miscii-14b-0218

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }


This does not:
slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  0,  2 ], model: sthenno-com/miscii-14b-0218 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  2,  6 ], model: sthenno-com/miscii-14b-0218 }


@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!
view post
Post
4611
I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:

https://huggingface.co./sometimesanotion/Lamarck-14B-v0.7-Fusion

It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico 's jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha 's suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors.

A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.

I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.

Thank you, @mradermacher and @MaziyarPanahi , for the first-day quantizations! Your work helped get me started. https://huggingface.co./models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion

datasets

None public yet