Inference Endpoints

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

inferenceendpoints's activity

lewtunย 
posted an update 10 days ago
view post
Post
6442
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐Ÿ”ฅ

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

๐Ÿ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

๐ŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

๐Ÿงญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
ยท
jeffboudierย 
posted an update about 1 month ago
nbroadย 
posted an update about 2 months ago
view post
Post
3549
hi florent and livestream!
ยท
jeffboudierย 
posted an update 3 months ago
jeffboudierย 
posted an update 3 months ago
view post
Post
450
Inference Endpoints got a bunch of cool updates yesterday, this is my top 3
jeffboudierย 
posted an update 3 months ago
view post
Post
4032
Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up
  • 2 replies
ยท
jeffboudierย 
posted an update 8 months ago
lewtunย 
posted an update 9 months ago
view post
Post
5030
Introducing Zephyr 141B-A35B ๐Ÿช:

HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Yesterday, Mistral released their latest base model (via magnet link of course ๐Ÿ˜…) and the community quickly converted it to transformers format and pushed it to the Hub: mistral-community/Mixtral-8x22B-v0.1

Early evals of this model looked extremely strong, so we teamed up with Argilla and KAIST AI to cook up a Zephyr recipe with a few new alignment techniques that came out recently:

๐Ÿง‘โ€๐Ÿณ Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm developed by @JW17 and @nlee-208 and @j6mes and does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO.

๐Ÿฆซ Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla. To create this dataset, they took the excellent Capybara SFT dataset from @LDJnr LDJnr/Capybara and converted it into a preference dataset by augmenting the final turn with responses from new LLMs that were then ranked by GPT-4.

What we find especially neat about this approach is that training on 7k samples only takes ~1.3h on 4 H100 nodes, yet produces a model that is very strong on chat benchmarks like IFEval and BBH.

Kudos to @alvarobartt @JW17 and @nlee-208 for this very nice and fast-paced collab!

For more details on the paper and dataset, checkout our collection: HuggingFaceH4/zephyr-orpo-6617eba2c5c0e2cc3c151524
jeffboudierย 
posted an update 9 months ago
philschmidย 
posted an update 9 months ago
view post
Post
6897
New state-of-the-art open LLM! ๐Ÿš€ Databricks just released DBRX, a 132B MoE trained on 12T tokens. Claiming to surpass OpenAI GPT-3.5 and is competitive with Google Gemini 1.0 Pro. ๐Ÿคฏ

TL;DR
๐Ÿงฎ 132B MoE with 16 experts with 4 active in generation
๐ŸชŸ 32 000 context window
๐Ÿ“ˆ Outperforms open LLMs on common benchmarks, including MMLU
๐Ÿš€ Up to 2x faster inference than Llama 2 70B
๐Ÿ’ป Trained on 12T tokens
๐Ÿ”ก Uses the GPT-4 tokenizer
๐Ÿ“œ Custom License, commercially useable

Collection: databricks/dbrx-6601c0852a0cdd3c59f71962
Demo: databricks/dbrx-instruct

Kudos to the Team at Databricks and MosaicML for this strong release in the open community! ๐Ÿค—
ยท
lewtunย 
posted an update 10 months ago
view post
Post
Can we align code generation models to be good at chat without compromising their base capabilities ๐Ÿค”?

This was the question the H4 team asked itself when BigCode released StarCoder2 a bit over a week ago. We knew that code models like deepseek-ai/deepseek-coder-6.7b-instruct and m-a-p/OpenCodeInterpreter-DS-33B get impressive scores on code benchmarks like HumanEval, but they tend to score poorly on chat benchmarks like MT Bench and IFEval. We also knew that the Zephyr recipe we applied to Mistral 7B produced a strong chat model, so we wondered -- could be tweaked to produce a strong coding assistant?

It turns out the answer is yes and I'm happy to share StarChat2, a DPO fine-tune of StarCoder2 15B that scores highly on both HumanEval and MT Bench / IFEval ๐ŸŒŸ!

The most interesting lesson for me was that you get better models by blending in more code/math data than chat during the SFT step - in terms of tokens, we found a ratio of 3:1 worked best.

Anyway, here's a demo of the model, along with all the code and datasets we used to train it:

* Demo: HuggingFaceH4/starchat2-playground
* Collection: HuggingFaceH4/starchat2-15b-65f068417b330fafad751fce
* Recipe: https://github.com/huggingface/alignment-handbook

Hope it's useful to others!
  • 3 replies
ยท
philschmidย 
posted an update 11 months ago
view post
Post
What's the best way to fine-tune open LLMs in 2024? Look no further! ๐Ÿ‘€ย I am excited to share โ€œHow to Fine-Tune LLMs in 2024 with Hugging Faceโ€ using the latest research techniques, including Flash Attention, Q-LoRA, OpenAI dataset formats (messages), ChatML, Packing, all built with Hugging Face TRL. ๐Ÿš€

It is created for consumer-size GPUs (24GB) covering the full end-to-end lifecycle with:
๐Ÿ’กDefine and understand use cases for fine-tuning
๐Ÿง‘๐Ÿปโ€๐Ÿ’ปย Setup of the development environment
๐Ÿงฎย Create and prepare dataset (OpenAI format)
๐Ÿ‹๏ธโ€โ™€๏ธย Fine-tune LLM using TRL and the SFTTrainer
๐Ÿฅ‡ย Test and evaluate the LLM
๐Ÿš€ย Deploy for production with TGI

๐Ÿ‘‰ย  https://www.philschmid.de/fine-tune-llms-in-2024-with-trl

Coming soon: Advanced Guides for multi-GPU/multi-Node full fine-tuning and alignment using DPO & KTO. ๐Ÿ”œ
ยท