QwQ-32B-Preview-bnb-4bit
Introduction
QwQ-32B-Preview-bnb-4bit is a 4-bit quantized version of the QwQ-32B-Preview model, utilizing the Bits and Bytes (bnb) quantization technique. This quantization significantly reduces the model's size and inference latency, making it more accessible for deployment on resource-constrained hardware.
Model Details
- Quantization: 4-bit using Bits and Bytes (bnb)
- Base Model: Qwen/QwQ-32B-Preview
- Parameters: 32.5 billion
- Context Length: Up to 32,768 tokens
- Downloads last month
- 82