This is a quantization of the Mistral-Small-24B-Instruct-2501.

Mistral Small 3 (2501) is a cutting-edge 24B parameter model that redefines the small LLM category under 70B, offering state-of-the-art performance comparable to larger models. Designed for fast conversational AI, low-latency function calling, and expert fine-tuning, it excels in multilingual support, advanced reasoning, and structured output generation. Released under an Apache 2.0 license, Mistral Small 3 embodies a commitment to open-source AI, serving as a versatile foundation for both community and enterprise use.

Evaluations

This model provides an accuracy recovery of 99.56%.

English Mistral-Small-24B-Instruct-2501 Mistral-Small-24B-Instruct-2501-FP8-Dynamic (this)
Avg. 76.04 75.7
ARC 72.6 72.1
Hellaswag 74.5 74.4
MMLU 81.01 80.6

We did not check for data contamination. Evaluation was done using Eval. Harness with limit=1000.

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Mistral-Small-24B-Instruct-2501-FP8-Dynamic --max-model-len 16000 --gpu-memory-utilization 0.9

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Mistral-Small-24B-Instruct-2501-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '

⚡ This model is optimized to handle heavy workloads providing a total throughput of ️2335 tokens per second using one NVIDIA L40S ⚡

Downloads last month
53
Safetensors
Model size
23.6B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for cortecs/Mistral-Small-24B-Instruct-2501-FP8-Dynamic