Open R1: Update #4

Welcome DeepSeek-V3 0324
This week, a new model from DeepSeek silently landed on the Hub. It’s an updated version of DeepSeek-V3, the base model underlying the R1 reasoning model. There isn’t much information shared yet on this new model, but we do know a few things!
What we know so far
The model has the same architecture as the original DeepSeek-V3 and now also comes with an MIT license, while the previous V3 model had a custom model license. The focus of this model release was on improving the instruction following as well as code and math capabilities. Let’s have a look!
How good is it?
The DeepSeek team has evaluated the model on a range of math and coding tasks and we can see the model’s strong capabilities compared to other frontier models:
Clearly, the model plays in the top league: often on par with GPT-4.5 and generally stronger than Claude-Sonnet-3.7.
To summarise the model has seen significant improvements across benchmarks
- MMLU-Pro: 75.9 → 81.2 (+5.3) (A good benchmark for overall understanding)
- GPQA: 59.1 → 68.4 (+9.3)
- AIME: 39.6 → 59.4 (+19.8) (proxy for MATH capabilities)
- LiveCodeBench: 39.2 → 49.2 (+10.0) (indicator of coding abilities)
Specifically, in the model card the DeepSeek mentions targeted improvements in the following areas:
- Front-End Web Development
- Improved executability of the code
- More aesthetically pleasing web pages and game front-ends
- Chinese Writing Proficiency
- Enhanced style and content quality
- Aligned with the R1 writing style
- Better quality in medium-to-long-form writing
- Feature Enhancements
- Improved mutli-turn interactive rewriting
- Optimized translation quality and letter writing
- Enhanced style and content quality
- Chinese Search Capabilities
- Enhanced report analysis requests with more detailed outputs
- Function Calling Improvements
- Increased accuracy in Function Calling, fixing issues in previous V3 versions
So the question might pop-up: how did they actually do this? Let’s speculate a bit!
How did they do it?
Given the naming and architecture it is fairly safe to assume that the new model is based on the previous V3 model and trained on top of it. There are two possible areas how they improved the models:
- Continual pretraining: Starting with the V3 model it’s possible to continue the pretraining process with a) newer, more up-to-date data and b) use data that has been better curated and thus higher quality. This will improve the factuality on recent events and improve the capabilities generally.
- Improved post-training: Especially in the era of instruction following and style post-training plays the most important role. Likely they improved the post-training data mix and maybe even the algorithm.
Until the team releases a technical report we don’t know for sure what they tweaked but the post-training pipeline is quite likely and potentially also adding a bit of pretraining. So have a look at how to use the models next!
How to use the model
Inference Providers
You can use Hugging Face’s Inference Providers to quickly experiment with this model. It’s available through Fireworks, Hyperbolic, and Novita.
Here’s an example using the huggingface_hub
library. You can also use the OpenAI client library like in this example.
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="fireworks-ai",
#api_key="your hf or provider token"
)
messages = [
{
"role": "user",
"content": "My first is second in line; I send shivers up your spine; not quite shining bright. I glitter in the light."
}
]
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=messages,
temperature=0.3,
)
print(completion.choices[0].message['content'])
# ...**Final Answer: ice**
Text Generation Inference
TGI supports running DeepSeek V3-0324 with its latest release as well. You can use it directly with the tagged docker image on a node of H100s
docker run --gpus all --shm-size 1g -p 8080:80 -v $ volume:/data \
ghcr.io/huggingface/text-generation-inference:3.2.1 --model-id deepseek-ai/DeepSeek-V3-0324
SGLang
SGLang supports running DeepSeek V3-0324 out of the box along with the Multi Latent Attention and Data Parallelism optimisations as well. To use you can simply just run the following on a node of H100s. For more information follow along here.
docker pull lmsysorg/sglang:latest
docker run --gpus all --shm-size 32g -p 30000:30000 -v ~/.cache/huggingface:/root/.cache/huggingface --ipc=host --network=host --privileged lmsysorg/sglang:latest \
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3-0324 --tp 8 --trust-remote-code --port 30000
Dynamic Quants from Unsloth and Llama.cpp
Running large LLMs like DeepSeek V3-0324 can be quite compute intensive and would require a large amount GPU VRAM to run. This is where Quantization comes in, it allows the end user to use the same model but with much lower VRAM consumption with a small trade-off in downstream performance.
Unsloth AI created Dynamic quantisations which allow one to run DeepSeek V3 with half the amount of compute as one node of H100 and can run with llama.cpp without as much degradation in benchmarks. Read more about it here: https://huggingface.co./unsloth/DeepSeek-V3-0324-GGUF
Is it safe?
Running language model safely has always been at the center of attention, ever since the first GPT models have been released. With the immense popularity of the DeepSeek models and their origin the question has found new interest. Let us run down the things that are safe to do and areas where some caution is a good idea. This is not DeepSeek specific but true for any open model!
First of all - is it safe to even download the model?
Downloading and running the model
Yes, downloading the model is safe. There are a few precautions on the Hub side that make sure it’s safe to download and run models:
- Safetensors: The
safetensors
format is used to store the DeepSeek model weights on the Hub ensuring no hidden code execution is possible; which was a risk with the older PyTorchpickle
format. Thus no malicious code can be hidden in the weights file. Read more in the Safetensors blog. - Modeling code: To run the model, the modeling code also needs to be downloaded along with the weight files. There are three mechanisms in place to improve safety there: 1. the files are fully visible on the hub, 2. the user needs to explicitly set
trust_remote_code=True
to execute any code associated with the model, 3. a security scanner runs over files on the hub and flags any malicious code files. If you want to be extra careful you can pin the model version with therevision
setting to make sure you download the version of the modeling code that has been reviewed.
So downloading the weights is safe, and upon code review so is executing the modeling code. This means you can run the DeepSeek model locally without the risk of backdoors or malicious code execution.
So what would be the main risks outside of downloading and running the model? It depends on what you do with the model outputs!
Model outputs
The advice that follows is not specific to any model, and applies to both open and closed models: whether considering risks stemming from built-in secret behaviours in the model or from a model accidentally producing bad outputs.
We’ll cover risks in three areas: alignment, code generation and agents.
Alignment mismatch: Every model provider chooses how and to which values their models are aligned. What these values are and how they are chosen typically remains opaque and they might also change over time (see this study). The advantage of open models is that the alignment can be changed with custom fine-tuning at a later stage still as the example of Perplexity’s DeepSeek 1776 shows.


As a rule, users should be aware that any LLM is biased in one way or another and treat the model outputs accordingly.
Code generation: One of the most popular use-cases of LLMs is as coding assistants. However, this is also where indiscriminate usage of the model outputs can have the most negative effects. Models are trained on vast amounts of published code, new and old. This typically includes potentially malicious code or code that contains known vulnerabilities. So models might produce similar vulnerabilities when proposing code solutions.
So, how can you prevent security issues when using LLMs for code development? Run thorough code reviews of the proposed changes and scan the code with appropriate tools for vulnerabilities, as you would with any other code contribution.
Agents: In the past few months agent applications have gained significant interest, giving LLMs more autonomy and agency also bears risks. It’s important to be careful about what kind of system access agents have and which information you provide them. Some good practices:
- Sandboxes: don’t run agents on your machine where they have access and control of your computer. This avoids leaking private information or accidentally deleting important files.
- Private information: don’t share private information such as logins with the LLM. If you need to give the model access to a system use dedicated access keys with strict access rules.
- Human-in-the-loop: for high stakes processes that you want to automate with agents make sure there is a human in the loop for final confirmation.
TL;DR: Is it safe to run the models? Yes, downloading and running the models is safe, but, as with any model, you should take precautions to use the models generations with the appropriate safety measures.