RoboPoint-v1-Llama2-13B
RoboPoint is an open-source vision-language model instruction-tuned on a mix of robotics and VQA data. Given an image with language instructions, it outputs precise action guidance as points.
Primary Use Cases
RoboPoint can predict spatial affordances—where actions should be taken in relation to other entities—based on instructions. For example, it can identify free space on a shelf in front of the rightmost object.
Model Details
This model was fine-tuned from meta-llama/Llama-2-13b-chat-hf and has 13 billion parameters.
Date
This model was trained in June 2024.
Resources for More Information
- Paper: https://arxiv.org/pdf/2406.10721
- Code: https://github.com/wentaoyuan/RoboPoint
- Website: https://robo-point.github.io
Training dataset
See wentao-yuan/robopoint-data.
Citation
If you find our work helpful, please consider citing our paper.
@article{yuan2024robopoint,
title={RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics},
author={Yuan, Wentao and Duan, Jiafei and Blukis, Valts and Pumacay, Wilbert and Krishna, Ranjay and Murali, Adithyavairavan and Mousavian, Arsalan and Fox, Dieter},
journal={arXiv preprint arXiv:2406.10721},
year={2024}
}
- Downloads last month
- 770
Model tree for wentao-yuan/robopoint-v1-llama-2-13b
Base model
meta-llama/Llama-2-13b-chat-hf