aws-neuron (AWS Inferentia and Trainium)

dacorvo

updated a model 3 days ago

aws-neuron/optimum-neuron-cache

Updated 3 days ago • 14

jburtoft

updated a model 12 days ago

aws-neuron/optimum-neuron-cache

Updated 3 days ago • 14

pagezyhf

posted an update 14 days ago

Post

396

Learn how to deploy multiple LoRA adapters on Vertex AI with this blogpost, using Hugging Face Deep Learning Containers on GCP.

https://medium.com/google-cloud/open-models-on-vertex-ai-with-hugging-face-serving-multiple-lora-adapters-on-vertex-ai-e3ceae7b717c

jeffboudier

posted an update 21 days ago

Post

553

NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co./blog/mingyuliutw/nvidia-cosmos

1 reply

·

pagezyhf

posted an update about 2 months ago

Post

360

Today you are able to access some of the most famous models from the Hugging Face community in Amazon Bedrock 🤯

Amazon Bedrock expands its model catalog with Bedrock Marketplace to hundreds of specialized models.

https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/model-catalog

pagezyhf

posted an update about 2 months ago

Post

974

It’s 2nd of December , here’s your Cyber Monday present 🎁 !

We’re cutting our price down on Hugging Face Inference Endpoints and Spaces!

Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3️⃣ months. We have other reductions on all instances ranging from 20 to 50%.

Sounds like the time to give Inference Endpoints a try? Get started today and find in our documentation the full pricing details.
https://ui.endpoints.huggingface.co/
https://huggingface.co./pricing

pagezyhf

posted an update 2 months ago

Post

304

Hello Hugging Face Community,

if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !

To watch in this order:
1. Learn what are Hugging Face Deep Learning Containers
https://youtu.be/aWMp_hUUa0c?si=t-LPRkRNfD3DDNfr

2. Learn how to deploy a LLM with our Deep Learning Container using Text Generation Inference
https://youtu.be/Q3oyTOU1TMc?si=V6Dv-U1jt1SR97fj

3. Learn how to scale your inference endpoint based on traffic
https://youtu.be/QjLZ5eteDds?si=nDIAirh1r6h2dQMD

If you want more of these small tutorials and have any theme in mind, let me know!

jeffboudier

posted an update 2 months ago

Post

1049

New - add your bluesky account to your HF profile:
https://huggingface.co./settings/profile

Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social

By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

pagezyhf

posted an update 2 months ago

Post

1363

Hello Hugging Face Community,

I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!

With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:

⚡ Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration.
🛠️ Hassle-free environment setup, no more dependency issues.
🔄 Seamless updates to the latest stable versions.
💼 Streamlined workflow, reducing dev and maintenance overheads.
🔒 Robust security features of Google Cloud.
☁️ Fine-tuned for optimal performance, integrated with GKE and Vertex AI.
📦 Community examples for easy experimentation and implementation.
🔜 TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!

Find the documentation at https://huggingface.co./docs/google-cloud/en/index
If you need support, open a conversation on the forum: https://discuss.huggingface.co/c/google-cloud/69

jeffboudier

posted an update 4 months ago

Post

1093

This week in Inference Endpoints - thx @erikkaum for the update!

👀 https://huggingface.co./blog/erikkaum/endpoints-changelog

1 reply

·

jeffboudier

posted an update 4 months ago

Post

457

Inference Endpoints got a bunch of cool updates yesterday, this is my top 3

jeffboudier

posted an update 4 months ago

Post

4042

Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up

3 replies

·

jeffboudier

posted an update 9 months ago

Post

1697

TGI v2.0.2 is out!
- New models (idefics2, phi3)
- Cleaner VLM support in the openai layer
- Upgraded to pytorch 2.3.0

https://github.com/huggingface/text-generation-inference/releases/tag/v2.0.2

Kudos @Narsil @olivierdehaene @drbh and so many contributors!

jeffboudier

posted an update 10 months ago

Post

1879

These 15 open models are available for serverless inference on Cloudflare Workers AI, powered by GPUs distributed in 150 datacenters globally - 👏 @rita3ko @mchenco @jtkipp @nkothariCF @philschmid

Cloudflare/hf-curated-models-available-on-workers-ai-66036e7ad5064318b3e45db6

philschmid

posted an update 10 months ago

Post

7225

New state-of-the-art open LLM! 🚀 Databricks just released DBRX, a 132B MoE trained on 12T tokens. Claiming to surpass OpenAI GPT-3.5 and is competitive with Google Gemini 1.0 Pro. 🤯

TL;DR
🧮 132B MoE with 16 experts with 4 active in generation
🪟 32 000 context window
📈 Outperforms open LLMs on common benchmarks, including MMLU
🚀 Up to 2x faster inference than Llama 2 70B
💻 Trained on 12T tokens
🔡 Uses the GPT-4 tokenizer
📜 Custom License, commercially useable

Collection: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: https://huggingface.co./spaces/databricks/dbrx-instruct

Kudos to the Team at Databricks and MosaicML for this strong release in the open community! 🤗

4 replies

·

philschmid

posted an update about 1 year ago

Post

What's the best way to fine-tune open LLMs in 2024? Look no further! 👀 I am excited to share “How to Fine-Tune LLMs in 2024 with Hugging Face” using the latest research techniques, including Flash Attention, Q-LoRA, OpenAI dataset formats (messages), ChatML, Packing, all built with Hugging Face TRL. 🚀

It is created for consumer-size GPUs (24GB) covering the full end-to-end lifecycle with:
💡Define and understand use cases for fine-tuning
🧑🏻‍💻 Setup of the development environment
🧮 Create and prepare dataset (OpenAI format)
🏋️‍♀️ Fine-tune LLM using TRL and the SFTTrainer
🥇 Test and evaluate the LLM
🚀 Deploy for production with TGI

👉 https://www.philschmid.de/fine-tune-llms-in-2024-with-trl

Coming soon: Advanced Guides for multi-GPU/multi-Node full fine-tuning and alignment using DPO & KTO. 🔜