Hugging Face Welcomes the Qwen2.5-Coder Series

Community Article Published November 12, 2024

image/png

On November 12, 2024 (today), the Qwen team published state-of-the-art (SoTA) language models that match the coding capabilities of GPT-4o. The models have been very well received by the community, and I tip my hat to the Qwen team for being open with their research.

If you want to know more about the model series or the benchmarks, head on to the official blog post. If you are still reading this, let's set up some context for what you are about to read. In this blog post, I will share my thoughts about the model release and what I really like about it.

Quick links:

  1. Qwen2.5-Coder Series
  2. Qwen2.5-Coder Assistant Space
  3. Qwen2.5-Coder Artifact Space
  4. Hugging Chat with Qwen2.5-Coder-32B

Should There Be Closed-Source Releases?

Working at Hugging Face makes me a strong believer in open-source research. I wholeheartedly believe that research, if done in the open, will flourish and lead to a better and more robust future.

While I do not advocate for closed research, I think it is necessary to some extent. People who put their heart and soul into open publication need a benchmark to surpass. This builds a competitive scenario which, in my opinion, will always help accelerate innovation. My stance and my work aim to nurture the healthy side of competition and restrict the toxic aspects.

Qwen2.5-Coder Series

Benchmark

Now that I have laid some context on competition and openness, I find the Qwen team harnessing healthy competition. Their openness with the community and their effort to compete with closed-source research is truly mesmerizing. I, for one, did not expect that the new series of coder models would come so close to GPT-4o's capabilities.

I personally love what the series advocates:

  1. Powerful: As seen from the benchmarks released by the team, the models deliver top-notch performance in coding tasks.
  2. Diverse: We have various variants of models based on their sizes, ranging from 0.5B to 32B parameters. Everyone can now use the models for their specific use cases.
  3. Practical: When used as coding assistants, the models can significantly improve efficiency and productivity.

Artifacts for the Release

In this section, I highlight some of the artifacts accompanying the release.

Hugging Face Spaces

Hugging Face Spaces offer a simple way to host ML demo apps directly on your profile or your organization's profile. The quickest and easiest way to play around with the Qwen2.5-Coder Series is through the Hugging Face Spaces created by the Qwen team.

The Qwen team has built two Spaces around the models:

  1. Coder Assistant: Here, you can chat with all the instruction-tuned models. It's a fantastic way to explore the models' capabilities in understanding and generating code based on your prompts.

    Coder Assistant

  2. Coder Artifact: This is a personal favorite of mine. You can use the 32B model to create applications for you. If you have used Claude and its artifacts, this will feel right at home. It's an excellent tool for generating code artifacts based on high-level descriptions.

    Coder Artifact

An interesting aspect is that these Spaces are created with Gradio 5 (huge props to the Gradio team for this). If you want to know more about Gradio and how to create ML demos using it, head over to the documentation.

Gradio Interface

Getting Started with the Hugging Face Hub

Now that you are impressed by the demos, you might want to interact with the models through your own code. This is where 🤗 Transformers comes into play.

You can either use the pipeline API to abstract the tokenization and model calls:

# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Write a quick sort algorithm."},
]
pipe = pipeline("text-generation", model="Qwen/Qwen2.5-Coder-32B-Instruct")
pipe(messages)

Or you can execute a more detailed version for greater control:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-Coder-32B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Deploying the Model

"The series of models looks too good; I am sure it cannot be deployed easily." Let's prove you wrong. Here are the one-click ways of deploying the model and using it with an API.

Deployment Options

Deploying models to your favorite endpoint has never been easier. With just a few clicks, you can have the model up and running, ready to serve your applications.

"Smol Model When?"

We know that people are leaning towards local models. The Qwen team really cares about the community. They have also published GGUF model formats, making the models compatible to run on your local system. This means you can leverage the power of Qwen2.5-Coder models without relying on cloud services.

Here are the options for running the models locally:

Local Deployment Options

Fine-Tuning the Model

It already knows 40 different programming languages, but sometimes that is not enough. You might want to fine-tune the model on a programming language that was less represented in the training dataset or customize it for your specific domain.

All you need is a good dataset, and we've got the rest covered. Use SageMaker or the AutoTrain options to fine-tune the model on your custom dataset, and you are good to go.

Fine-Tuning Options

Conclusion

The Qwen2.5-Coder Series represents a significant step forward in open-source AI research, especially in the realm of code generation. By openly sharing their cutting-edge models, the Qwen team not only pushes the boundaries of what's possible but also fosters a community where innovation thrives through collaboration.

Their commitment to openness, diversity, and practicality sets a benchmark for others to follow. Whether you are a researcher, developer, or enthusiast, these models offer powerful tools to enhance your work.

Be open, be like Qwen.