iopo-exp (IOPO Experiments)

lewtun

posted an update 9 days ago

Post

6441

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!

2 replies

·

JW17

authored 2 papers about 1 month ago

Stable Language Model Pre-training by Reducing Embedding Variability

Paper • 2409.07787 • Published Sep 12

Cross-lingual Transfer of Reward Models in Multilingual Alignment

Paper • 2410.18027 • Published Oct 23

nlee-208

authored a paper about 2 months ago

Cross-lingual Transfer of Reward Models in Multilingual Alignment

Paper • 2410.18027 • Published Oct 23

nlee-208

updated a model 3 months ago

iopo-exp/uf-qwen2-dpo-tgf_o50-iter1

Text Generation • Updated Sep 26 • 17

nlee-208

authored 2 papers 3 months ago

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Paper • 2406.06424 • Published Jun 10 • 12

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Paper • 2406.05761 • Published Jun 9 • 2

gabrielmbmb

posted an update 4 months ago

Post

1822

Yesterday @mattshumer released mattshumer/Reflection-Llama-3.1-70B, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie 🐦 in combination with https://huggingface.co./meta-llama/Meta-Llama-3.1-70B-Instruct to generate reasoning instructions.
2. We generate a response again using https://huggingface.co./meta-llama/Meta-Llama-3.1-70B-Instruct, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think 💭 and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset gabrielmbmb/distilabel-reflection-tuning you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.

nlee-208

updated a model 4 months ago

iopo-exp/uf-l31-orpo-base-armo-iter1

Updated Aug 30 • 14

alvarobartt

posted an update 4 months ago

Post

2848

🤗 Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co./meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co./blog/llama31-on-vertex-ai

gabrielmbmb

posted an update 5 months ago

Post

2900

distilabel 1.3.0 is out! This release contains many core improvements and new tasks that help us building argilla/magpie-ultra-v0.1!

Distributed pipeline execution with Ray, new Magpie tasks, reward models, components for dataset diversity based on sentence embeddings, Argilla 2.0 compatibility and many more features!

Check the new release in GitHub: https://github.com/argilla-io/distilabel

gabrielmbmb

posted an update 5 months ago

Post

3558

Just dropped magpie-ultra-v0.1! The first open synthetic dataset generated with Llama 3.1 405B. Created with distilabel, it's our most advanced and compute-intensive pipeline to date. We made the GPUs of the cluster go brrrrr 🚀

argilla/magpie-ultra-v0.1

Take it a look and tell us what you think! Probably, the models taking the most out of it are smol models 🤗 We will be improving the dataset in upcoming iterations!

gabrielmbmb

posted an update 6 months ago

Post

2503

⚗️ distilabel 1.2.0 is out and it comes with improved support for structured generation, new tasks for generating datasets for training embedding models, new steps for loading data, MixtureOfAgentsLLM and improved docs.

We would love to see a few new datasets for training embedding models built with distilabel on the Hub! ❤️

kashif

authored a paper 7 months ago

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Paper • 2406.06424 • Published Jun 10 • 12

JW17

authored a paper 7 months ago

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Paper • 2406.06424 • Published Jun 10 • 12

alvarobartt

posted an update 8 months ago

Post

3098

🔥 Prometheus 2 was recently released by Kaist AI as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus!

prometheus-eval/prometheus-7b-v2.0
prometheus-eval/prometheus-8x7b-v2.0

🌬️Fine-tuned on top of mistralai/Mistral-7B-Instruct-v0.2 and mistralai/Mixtral-8x7B-Instruct-v0.1
🗂️The datasets used for fine-tuning have been publicly released i.e. prometheus-eval/Feedback-Collection and prometheus-eval/Preference-Collection
🤝🏻Unified LM evaluator for absolute (a single prompt-completion pair) and relative (two completions for a given prompt) due to model merging
❌No longer needs a mandatory reference / golden answer, but can still be provided optionally
🔝Surpasses the former version of Prometheus, and has a high correlation with human, GPT-4, and Claude 3 Opus scores when evaluating LMs
📝Apache 2.0 license

Long-story short, an amazing job from Kaist AI bridging the gap with LLM evaluators other than proprietary and bigger models!

This week at Argilla, we decided to add a new task to use Prometheus 2 as an LLM evaluator using distilabel, so we implemented PrometheusEval.

😱 Using PrometheusEval running their 7B variant with vLLM in a single L40 on top of HuggingFaceH4/instruction-dataset, we got the 327 existing prompt-completion pairs evaluated and pushed to the Hub in less than 2 minutes!

Find the generated dataset and the code at distilabel-internal-testing/instruction-dataset-prometheus

1 reply

·

alvarobartt

posted an update 8 months ago

Post

2761

🦫 We have just released argilla/Capybara-Preferences in collaboration with Kaist AI ( @JW17 , @nlee-208 ) and Hugging Face ( @lewtun )

A new synthetic preference dataset built using distilabel on top of the awesome LDJnr/Capybara from @LDJnr

The current dataset combines the already generated alternative completions from argilla/distilabel-capybara-dpo-7k-binarized, while also adding the remaining ones using the same approach!

Here are some key features on how we built it:

- 🧹 Duplicate removal, keeping the conversation besides the last assistant response, and some slight pre-processing

- 🤖 Generation of alternative completions for the existing conversations (last turn only) with: mlabonne/NeuralBeagle14-7B, argilla/notus-7b-v1, and teknium/OpenHermes-2.5-Mistral-7B

- 👨🏻‍🏫 Running UltraFeedback via GPT-4 to generate the critique i.e. ratings and rationales, for the last assistant responses

- 🎉 Finally, we selected the chosen and rejected responses based on their UltraFeedback score, and applied some slight post-processing!

Sounds simple right? Start building your own synthetic datasets with https://github.com/argilla-io/distilabel already!

lewtun

posted an update 9 months ago

Post

5029

Introducing Zephyr 141B-A35B 🪁:

HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Yesterday, Mistral released their latest base model (via magnet link of course 😅) and the community quickly converted it to transformers format and pushed it to the Hub: mistral-community/Mixtral-8x22B-v0.1

Early evals of this model looked extremely strong, so we teamed up with Argilla and KAIST AI to cook up a Zephyr recipe with a few new alignment techniques that came out recently:

🧑‍🍳 Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm developed by @JW17 and @nlee-208 and @j6mes and does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO.

🦫 Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla. To create this dataset, they took the excellent Capybara SFT dataset from @LDJnr LDJnr/Capybara and converted it into a preference dataset by augmenting the final turn with responses from new LLMs that were then ranked by GPT-4.

What we find especially neat about this approach is that training on 7k samples only takes ~1.3h on 4 H100 nodes, yet produces a model that is very strong on chat benchmarks like IFEval and BBH.

Kudos to @alvarobartt @JW17 and @nlee-208 for this very nice and fast-paced collab!

For more details on the paper and dataset, checkout our collection: HuggingFaceH4/zephyr-orpo-6617eba2c5c0e2cc3c151524

JW17

authored a paper 9 months ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 63

nlee-208

authored a paper 9 months ago

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 63

IOPO Experiments

AI & ML interests

Recent Activity

iopo-exp's activity

Stable Language Model Pre-training by Reducing Embedding Variability

Cross-lingual Transfer of Reward Models in Multilingual Alignment

Cross-lingual Transfer of Reward Models in Multilingual Alignment

iopo-exp/uf-qwen2-dpo-tgf_o50-iter1

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

iopo-exp/uf-l31-orpo-base-armo-iter1

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

ORPO: Monolithic Preference Optimization without Reference Model

ORPO: Monolithic Preference Optimization without Reference Model

AI & ML interests

Recent Activity

Team members 7

iopo-exp's activity