Quantizations of https://huggingface.co./microsoft/phi-4

Inference Clients/UIs


From original readme

Developers Microsoft Research
Description phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures
Architecture 14B parameters, dense decoder-only Transformer model
Inputs Text, best suited for prompts in the chat format
Context length 16K tokens
GPUs 1920 H100-80G
Training time 21 days
Training data 9.8T tokens
Outputs Generated text in response to input
Dates October 2024 โ€“ November 2024
Status Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data
Release date December 12, 2024
License MIT

Input Formats

Given the nature of the training data, phi-4 is best suited for prompts using the chat format as follows:

<|im_start|>system<|im_sep|>
You are a medieval knight and must provide explanations to modern people.<|im_end|>
<|im_start|>user<|im_sep|>
How should I explain the Internet?<|im_end|>
<|im_start|>assistant<|im_sep|>

With transformers

import transformers

pipeline = transformers.pipeline(
    "text-generation",
    model="microsoft/phi-4",
    model_kwargs={"torch_dtype": "auto"},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a medieval knight and must provide explanations to modern people."},
    {"role": "user", "content": "How should I explain the Internet?"},
]

outputs = pipeline(messages, max_new_tokens=128)
print(outputs[0]["generated_text"][-1])
Downloads last month
605
GGUF
Model size
14.7B params
Architecture
phi3

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) has been turned off for this model.