merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the DARE TIES merge method using CultriX/SeQwence-14Bv1 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: CultriX/Qwen2.5-14B-Wernickev3
    parameters:
      weight: 0.38       # Slight reduction to balance with FinalMerge's generalist capabilities.
      density: 0.65      # Retain significant parameters for stability and strong task performance.
  - model: CultriX/Qwen2.5-14B-FinalMerge
    parameters:
      weight: 0.32       # Slight increase to ensure its generalist capabilities are fully utilized.
      density: 0.60      # Balanced density for comprehensive task coverage.
  - model: CultriX/Qwen2.5-14B-Emergedv3
    parameters:
      weight: 0.20       # Retains focused contribution to specific task optimizations.
      density: 0.55      # Moderate density ensures efficient parameter usage.
  - model: qingy2019/Qwen2.5-Math-14B-Instruct
    parameters:
      weight: 0.10       # Consistent with its specialist focus, balancing lower weight with higher density.
      density: 0.70      # High density ensures retention of advanced reasoning and MATH-related parameters.

merge_method: dare_ties
base_model: CultriX/SeQwence-14Bv1
parameters:
  normalize: true        # Ensures all models are scaled to compatible parameter ranges.
  int8_mask: true        # Optimizes memory and computational efficiency without accuracy loss.
dtype: bfloat16          # Provides better memory efficiency and numerical stability.

adaptive_merge_parameters:
  task_weights:
    tinyArc: 1.3         # Slight reduction to balance with generalist contributions.
    tinyHellaswag: 1.3   # Maintains strong performance in contextual reasoning.
    tinyMMLU: 1.2        # Balanced focus for domain-specific knowledge.
    tinyTruthfulQA: 1.2  # Adjusted to ensure fair contribution without over-prioritization.
    tinyTruthfulQA_mc1: 1.1 # Maintains a moderate priority to balance with other tiny benchmarks.
    tinyWinogrande: 1.2  # Strong contextual reasoning support from generalist models.
    IFEval: 1.5          # High weight for general instruction-following capabilities.
    BBH: 1.5             # Prioritizes complex reasoning and multi-step problem-solving tasks.
    MATH: 1.55           # Slight reduction to balance MATH with other advanced reasoning benchmarks.
    GPQA: 1.4            # Balanced to reflect contributions from both generalist and specialist models.
    MUSR: 1.4            # Increased slightly to strengthen multi-step reasoning.
    MMLU-PRO: 1.3        # Maintains general task performance across multitask domain knowledge.
  smoothing_factor: 0.18  # Slightly increased for smoother blending across task boundaries.
gradient_clipping: 0.88   # Tightened slightly for stability, preventing parameter over-contribution.
Downloads last month
10
Safetensors
Model size
14.8B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for CultriX/Qwen2.5-14B-FinalMergev2