This model has been trained on a larger version (194 minutes total) of the commabody dataset.
It includes a vqgan encoder/decoder fine tuned from imagenet. It compresses images of size 160x256 to 10x16 tokens.

It also includes a GPT2 model trained to predict the next frame, wheel speeds and actions. It can be used either as a simulator or as a policy. More details in our blog post.

You can run it on a comma body using our example script in body-jim.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train commaai/commabody-gpt2