Aymeric Roucher's picture

Aymeric Roucher

m-ric

AI & ML interests

Leading Agents at Hugging Face 🤗

Recent Activity

updated a collection about 20 hours ago
GUI Agents
posted an update about 21 hours ago
🚀 DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets! Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. 📚 👉 But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks. This is big news: with RL, maybe we could build good agents without the need for huge datasets. UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO). Specifically, the reward function assesses: 🎯 Action type accuracy: Does the predicted action match the ground truth? 📍 Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box? 📑 Output format: Does the model clearly articulate both its reasoning and final action? Using just 136 carefully selected mobile tasks—compared to 76,000 tasks for larger models like OS-Atlas—UI-R1 shows significant efficiency and improved performance: 📈 Boosted action prediction accuracy from 76% to 89% on AndroidControl. 🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K). 🔍 Enhanced adaptability and generalization, excelling even in out-of-domain scenarios. The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐 Read the full paper here 👉 https://huggingface.co./papers/2503.21620
View all activity

Organizations

Hugging Face's profile picture Orange's profile picture Atmos Bank's profile picture Hugging Test Lab's profile picture Tools's profile picture HuggingFaceM4's profile picture lecocqassociate's profile picture huggingPartyParis's profile picture Supreme's profile picture FactSet's profile picture Propulse Lab's profile picture Leaderboard Organization's profile picture FactSet's profile picture CGIAR's profile picture Aperture Laboratories's profile picture AI Energy Score's profile picture C&A's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Agent Collab's profile picture SLLHF's profile picture Data Agents's profile picture Hugging Face Party @ PyTorch Conference's profile picture Hugging Face FineVideo's profile picture Nerdy Face's profile picture Hugging Face Science's profile picture Agents Leaderboard's profile picture smolagents's profile picture Hugging Face Agents Course's profile picture Open R1's profile picture SIMS's profile picture Open Agents's profile picture GeekAgents's profile picture

Posts 97

view post
Post
670
🚀 DeepSeek R1 moment has come for GUI agents: Rule-based Reinforcement Learning gives better results than SFT with 500x smaller datasets!

Traditionally (by which I mean "in the last few months"), GUI agents have been trained with supervised fine-tuning (SFT). This meant, collecting huge datasets of screen captures from people using computers, and using these to fine-tune your model. 📚

👉 But last week, a new paper introduced UI-R1, applying DeepSeek's R1-style rule-based reinforcement learning (RL) specifically to GUI action prediction tasks.
This is big news: with RL, maybe we could build good agents without the need for huge datasets.

UI-R1 uses a unified reward function that evaluates multiple responses from models, optimizing via policy algorithms like Group Relative Policy Optimization (GRPO).

Specifically, the reward function assesses:
🎯 Action type accuracy: Does the predicted action match the ground truth?
📍 Coordinate accuracy (specifically for clicks): Is the predicted click within the correct bounding box?
📑 Output format: Does the model clearly articulate both its reasoning and final action?

Using just 136 carefully selected mobile tasks—compared to 76,000 tasks for larger models like OS-Atlas—UI-R1 shows significant efficiency and improved performance:
📈 Boosted action prediction accuracy from 76% to 89% on AndroidControl.
🌐 Outperformed larger, SFT-trained models (e.g., OS-Atlas-7B), demonstrating superior results with vastly fewer data points (136 tasks vs. 76K).
🔍 Enhanced adaptability and generalization, excelling even in out-of-domain scenarios.

The paper tests this RL-based method only in low-level GUI tasks. Could it generalize to more complex interactions? 🧐

Read the full paper here 👉 UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning (2503.21620)

Articles 10

Article
37

Trace & Evaluate your Agent with Arize Phoenix