It’s 2nd of December , here’s your Cyber Monday present 🎁 !
We’re cutting our price down on Hugging Face Inference Endpoints and Spaces!
Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3️⃣ months. We have other reductions on all instances ranging from 20 to 50%.
if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !
I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!
With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:
⚡ Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration. 🛠️ Hassle-free environment setup, no more dependency issues. 🔄 Seamless updates to the latest stable versions. 💼 Streamlined workflow, reducing dev and maintenance overheads. 🔒 Robust security features of Google Cloud. ☁️ Fine-tuned for optimal performance, integrated with GKE and Vertex AI. 📦 Community examples for easy experimentation and implementation. 🔜 TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!
Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!
These 15 open models are available for serverless inference on Cloudflare Workers AI, powered by GPUs distributed in 150 datacenters globally - 👏 @rita3ko@mchenco@jtkipp@nkothariCF@philschmid
New state-of-the-art open LLM! 🚀 Databricks just released DBRX, a 132B MoE trained on 12T tokens. Claiming to surpass OpenAI GPT-3.5 and is competitive with Google Gemini 1.0 Pro. 🤯
TL;DR 🧮 132B MoE with 16 experts with 4 active in generation 🪟 32 000 context window 📈 Outperforms open LLMs on common benchmarks, including MMLU 🚀 Up to 2x faster inference than Llama 2 70B 💻 Trained on 12T tokens 🔡 Uses the GPT-4 tokenizer 📜 Custom License, commercially useable
What's the best way to fine-tune open LLMs in 2024? Look no further! 👀 I am excited to share “How to Fine-Tune LLMs in 2024 with Hugging Face” using the latest research techniques, including Flash Attention, Q-LoRA, OpenAI dataset formats (messages), ChatML, Packing, all built with Hugging Face TRL. 🚀
It is created for consumer-size GPUs (24GB) covering the full end-to-end lifecycle with: 💡Define and understand use cases for fine-tuning 🧑🏻💻 Setup of the development environment 🧮 Create and prepare dataset (OpenAI format) 🏋️♀️ Fine-tune LLM using TRL and the SFTTrainer 🥇 Test and evaluate the LLM 🚀 Deploy for production with TGI