🔥 Features & benefits
The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. They can be used in combination with Google Cloud offerings including Google Kubernetes Engine (GKE) and Vertex AI. GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud’s infrastructure. Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs).
One command is all you need
With the new Hugging Face DLCs, train cutting-edge Transformers-based NLP models in a single line of code. The Hugging Face PyTorch DLCs for training come with all the libraries installed to run a single command e.g. via TRL CLI to fine-tune LLMs on any setting, either single-GPU, single-node multi-GPU, and more.
Accelerate machine learning from science to production
In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, huggingface-inference-toolkit
, that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on Google Cloud.
Deploy your trained models for inference with just one more line of code or select any of the 170,000+ publicly available models from the model Hub and deploy them on either Vertex AI or GKE.
High-performance text generation and embedding
Besides the PyTorch-oriented DLCs, Hugging Face also provides high-performance inference for both text generation and embedding models via the Hugging Face DLCs for both Text Generation Inference (TGI) and Text Embeddings Inference (TEI), respectively.
The Hugging Face DLC for TGI enables you to deploy any of the +140,000 text generation inference supported models from the Hugging Face Hub, or any custom model as long as its architecture is supported within TGI.
The Hugging Face DLC for TEI enables you to deploy any of the +10,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub, or any custom model as long as its architecture is supported within TEI.
Additionally, these DLCs come with full support for Google Cloud meaning that deploying models from Google Cloud Storage (GCS) is also straight forward and requires no configuration.
Built-in performance
Hugging Face DLCs feature built-in performance optimizations for PyTorch to train models faster. The DLCs also give you the flexibility to choose a training infrastructure that best aligns with the price/performance ratio for your workload.
The Hugging Face Training DLCs are fully integrated with Google Cloud, enabling the use of the latest generation of instances available on Google Cloud Compute Engine.
Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features.
Read more about both Vertex AI in their official documentation and GKE in their official documentation.
< > Update on GitHub