Post
2479
Introducing GUICourse! ๐
By leveraging extensive OCR pretraining with grounding ability, we unlock the potential of parsing-free methods for GUIAgent.
๐ Paper: ( GUICourse: From General Vision Language Models to Versatile GUI Agents (2406.11317))
๐ Github Repo: (https://github.com/yiye3/GUICourse)
๐ Dataset: ( yiye2023/GUIAct) / ( yiye2023/GUIChat) / ( yiye2023/GUIEnv)
๐ฏ Model: ( RhapsodyAI/minicpm-guidance) / ( RhapsodyAI/qwen_vl_guidance)
By leveraging extensive OCR pretraining with grounding ability, we unlock the potential of parsing-free methods for GUIAgent.
๐ Paper: ( GUICourse: From General Vision Language Models to Versatile GUI Agents (2406.11317))
๐ Github Repo: (https://github.com/yiye3/GUICourse)
๐ Dataset: ( yiye2023/GUIAct) / ( yiye2023/GUIChat) / ( yiye2023/GUIEnv)
๐ฏ Model: ( RhapsodyAI/minicpm-guidance) / ( RhapsodyAI/qwen_vl_guidance)