Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper
•
2412.04454
•
Published
•
48
https://aguvis-project.github.io
Note AGUVIS is a unified pure vision-based framework for autonomous GUI agents that can operate across various platforms (web, desktop, mobile). Unlike previous approaches that rely on textual representations, AGUVIS leverages unified purely vision-based observations and a consistent action space to ensure better generalization across different platforms.
Note AGUVIS data collection stage 1 computer/mobile grounding data.
Note AGUVIS data collection stage 2 computer/mobile/desktop trajectory data.