airoboros-2.2.1-limarpv3-y34b-exl2

Exllama v2 quant of Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b

Branches:

  • main: measurement.json calculated at 2048 token calibration rows on PIPPA
  • 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits
    • ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline)
  • 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
    • ideal for large (>24gb) VRAM setups
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) has been turned off for this model.

Datasets used to train Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b-exl2