Mistral-Zephyr-7B-slerp

This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Mistral-7B with Zephyr-7B's instruction-following improvements through an efficient SLERP fusion.

About Me

I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.

🔗 Connect with me on LinkedIn

Merge Details

Merge Method

This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:

Attention Layers: Variable interpolation values [0, 0.5, 0.3, 0.7, 1] leveraging Zephyr's strong instruction-following capabilities
MLP Layers: Variable interpolation values [1, 0.5, 0.7, 0.3, 0] maintaining Mistral's reasoning capabilities
Other Parameters: 0.5 interpolation value creating an equal blend for balanced performance
Format: bfloat16 precision for efficient memory usage

Models Merged

mistralai/Mistral-7B-v0.1 - The original Mistral model offering excellent base capabilities and innovative architecture
HuggingFaceH4/zephyr-7b-beta - A fine-tuned version of Mistral optimized for following complex instructions

Configuration

slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 32]
      - model: HuggingFaceH4/zephyr-7b-beta
        layer_range: [0, 32]
merge_method: slerp
base_model: mistralai/Mistral-7B-v0.1
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

Model Capabilities

This merge combines:

Mistral's strong foundational knowledge and reasoning
Zephyr's improved instruction following and coherence
Fully open architecture with no usage restrictions

The resulting model provides enhanced performance on tasks requiring both strong reasoning and good instruction following, such as:

Detailed explanations of complex concepts
Creative writing with coherent structure
Problem-solving with step-by-step reasoning
Balanced factual responses with nuanced perspectives

Limitations

Inherits limitations from both base models
May exhibit inconsistent behavior for certain complex reasoning tasks
No additional alignment or fine-tuning beyond the base models' training
Model was created through parameter merging without additional training data

License

This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.

Davidsv
/

Mistral-Zephyr-7B-slerp