GGUF
Inference Endpoints
conversational
Edit model card

QuantFactory Banner

QuantFactory/EVA-D-Qwen2.5-1.5B-v0.0-GGUF

This is quantized version of EVA-UNIT-01/EVA-D-Qwen2.5-1.5B-v0.0 created using llama.cpp

Original Model Card

EVA-D Qwen2.5-1.5B v0.0

An experimental online logit distillation of EVA-Qwen2.5-14B-v0.1 into Qwen2.5-1.5B. Should work as a RP/storywriting specialist, but don't expect superb performance from it, due to it's small size. All in all, it was a fun experiment to do.

Note: using quantized KV cache with Qwen2.5 is not recommended and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.

Prompt format is ChatML.


Recommended sampler values:

  • Temperature: 1
  • Min-P: 0.02

Recommended SillyTavern presets (via CalamitousFelicitousness):


Distillation data:

  • Arcee.AI's EvolKit-20k dataset, which is specifically made for knowledge distillation purposes.

Training time and hardware:

  • 1.8 hours on 8xA100 SXM, provided by Garg

Model was trained by Kearm and Auri.

Special thanks:

  • to Garg for generously providing 8xA100 SXM node for this experiment!
  • to Arcee.AI for creating DistillKit and EvolKit-20k dataset, which were used to create this model.
  • and to Allura-org for support and feedback on EVA models.
Downloads last month
434
GGUF
Model size
1.54B params
Architecture
qwen2

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .

Model tree for QuantFactory/EVA-D-Qwen2.5-1.5B-v0.0-GGUF

Base model

Qwen/Qwen2.5-1.5B
Quantized
(25)
this model

Dataset used to train QuantFactory/EVA-D-Qwen2.5-1.5B-v0.0-GGUF