Alvaro Bartolome's picture

Alvaro Bartolome

alvarobartt

AI & ML interests

machine learning @huggingface

Recent Activity

reacted to merve's post with πŸ‘€ 3 days ago
Oof, what a week! πŸ₯΅ So many things have happened, let's recap! https://huggingface.co./collections/merve/jan-24-releases-6793d610774073328eac67a9 Multimodal πŸ’¬ - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG πŸ’— - UI-TARS are new models by ByteDance to unlock agentic GUI control 🀯 in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark LLMs πŸ“– - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🀯 - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!) Audio πŸ—£οΈ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO Image/Video/3D Generation ⏯️ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images
reacted to merve's post with πŸ€— 3 days ago
Oof, what a week! πŸ₯΅ So many things have happened, let's recap! https://huggingface.co./collections/merve/jan-24-releases-6793d610774073328eac67a9 Multimodal πŸ’¬ - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG πŸ’— - UI-TARS are new models by ByteDance to unlock agentic GUI control 🀯 in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark LLMs πŸ“– - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🀯 - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!) Audio πŸ—£οΈ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO Image/Video/3D Generation ⏯️ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images
View all activity

Articles

Organizations

Hugging Face's profile picture Spaces-explorers's profile picture Hackathon Somos NLP 2023: Los LLMs hablan Español's profile picture SomosNLP's profile picture Hugging Test Lab's profile picture Open-Source AI Meetup's profile picture Hugging Face H4's profile picture Argilla's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture gg-hf's profile picture MLX Community's profile picture Argilla Explorers's profile picture distilabel-internal-testing's profile picture Data Is Better Together's profile picture ORPO Explorers's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture LLHF's profile picture SLLHF's profile picture Hugging Quants's profile picture blhf's profile picture Argilla Warehouse's profile picture nltpt's profile picture IOPO Experiments's profile picture Google Cloud 🀝🏻 Hugging Face's profile picture Huggingface HUGS's profile picture Data Is Better Together Contributor's profile picture

Posts 5

view post
Post
2940
πŸ€— Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co./meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co./blog/llama31-on-vertex-ai
view post
Post
3194
πŸ”₯ Prometheus 2 was recently released by Kaist AI as an alternative and closely mirroring both human and GPT-4 evaluation, and surpassing the former Prometheus!

prometheus-eval/prometheus-7b-v2.0
prometheus-eval/prometheus-8x7b-v2.0

🌬️Fine-tuned on top of mistralai/Mistral-7B-Instruct-v0.2 and mistralai/Mixtral-8x7B-Instruct-v0.1
πŸ—‚οΈThe datasets used for fine-tuning have been publicly released i.e. prometheus-eval/Feedback-Collection and prometheus-eval/Preference-Collection
🀝🏻Unified LM evaluator for absolute (a single prompt-completion pair) and relative (two completions for a given prompt) due to model merging
❌No longer needs a mandatory reference / golden answer, but can still be provided optionally
πŸ”Surpasses the former version of Prometheus, and has a high correlation with human, GPT-4, and Claude 3 Opus scores when evaluating LMs
πŸ“Apache 2.0 license

Long-story short, an amazing job from Kaist AI bridging the gap with LLM evaluators other than proprietary and bigger models!

This week at Argilla, we decided to add a new task to use Prometheus 2 as an LLM evaluator using distilabel, so we implemented PrometheusEval.

😱 Using PrometheusEval running their 7B variant with vLLM in a single L40 on top of HuggingFaceH4/instruction-dataset, we got the 327 existing prompt-completion pairs evaluated and pushed to the Hub in less than 2 minutes!

Find the generated dataset and the code at distilabel-internal-testing/instruction-dataset-prometheus