- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.
- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !
π TL;DR: OS-Atlas offers: 1. State-of-the-Art GUI Grounding: Helps GUI agents accurately locate GUI elements. 2. Strong OOD Performance and Cross-platform Compatibility: Excels in out-of-domain agentic tasks across MacOS, Windows, Linux, Android, and Web. 3. Complete Infrastructure for GUI Data Synthesis: You can easily build your own OS agent upon it!
πExcited to make public a series of checkpoints !
- Final checkpoints after self-training with ENVISIONS framework - Cover math, logic, and agent domains - Include 7B / 13B
π Check our paper: Title: Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models Link: https://arxiv.org/abs/2406.11736