This repository is a quantized version of the original model microsoft/Phi-3.5-MoE-instruct which is the FP16 half-precision official version released by Microsoft.

Model Summary

Phi-3.5-MoE is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available documents - with a focus on very high-quality, reasoning dense data. The model supports multilingual and comes with 128K context length (in tokens). The model underwent a rigorous enhancement process, incorporating supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

🏑 Phi-3 Portal
πŸ“° Phi-3 Microsoft Blog
πŸ“– Phi-3 Technical Report
πŸ‘©β€πŸ³ Phi-3 Cookbook
πŸ–₯️ Try It

MoE references: πŸ“œPhi-3.5-MoE Blog | 😁GRIN MoE

Phi-3.5: [mini-instruct]; [MoE-instruct] ; [vision-instruct]

Running πŸƒ

TGI

model=danieldk/Phi-3.5-MoE-instruct-AWQ-INT4
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.4.0 \
    --model-id $model --num-shard 2

Quantization Reproduction

Soon (need to upstream an AutoAWQ patch).

Downloads last month
212
Safetensors
Model size
5.83B params
Tensor type
I32
Β·
FP16
Β·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.