QwQ-32B-Preview-bnb-4bit

Introduction

QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.

Model Details

  • Quantization: 4-bit using Bits and Bytes (bnb)
  • Base Model: Qwen/QwQ-32B-Preview
  • Parameters: 32.5 billion
  • Context Length: Up to 32,768 tokens
Downloads last month
82
Safetensors
Model size
17.7B params
Tensor type
F32
BF16
U8
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for kurcontko/QwQ-32B-Preview-bnb-4bit

Base model

Qwen/Qwen2.5-32B
Quantized
(97)
this model