Papers
arxiv:2411.19309

GRAPE: Generalizing Robot Policy via Preference Alignment

Published on Nov 28
ยท Submitted by Zhaorun on Dec 2
#1 Paper of the day
Authors:
,
,
,
,
,

Abstract

Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-language model. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 60.36%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 44.31% and rollout step-length by 11.15%, respectively. All code, models, and data are available at https://grape-vla.github.io/

Community

Paper author Paper submitter

Can vision-language-action (VLA) models generalize to diverse OOD tasks and align with customized objectives? ๐Ÿค”
๐Ÿš€ We introduce GRAPE, a plug-and-play algorithm to generalize robot policies via preference alignment. Specifically, GRAPE unfolds three benefits to boost the generalizability of VLAs:

๐Ÿ‘‰1. GRAPE aligns VLAs on a trajectory level via an RL objective and endows the model with the ability for global decision-making, instead of merely cloning behavior;
๐Ÿ‘‰2. GRAPE implicitly models reward from both successful and failed trials to boost generalizability to diverse tasks;
๐Ÿ‘‰3. GRAPE adopts a scalable preference synthesis algorithm to rank trajectories with preferences that align with arbitrary objectives.

Our experiments on a diverse array of real-world and simulated robotic tasks reveal:

๐Ÿ‘‰1. GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 60.36%;
๐Ÿ‘‰2. GRAPE is versatile to be aligned with diverse objectives and reduce collision rates by 44.31% or rollout length by 11.15% when aligning towards safer or more efficient manipulation policy, respectively.

Check out our full project for more details:
๐Ÿ”ฅ Paper: https://arxiv.org/pdf/2411.19309
๐Ÿ”ฅ Project: https://grape-vla.github.io/
๐Ÿ”ฅ Code: https://github.com/aiming-lab/grape

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.19309 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.19309 in a Space README.md to link it from this page.

Collections including this paper 3