Action Decoding Errors

#4
by kiddyna - opened

Hi, I performed training on the LIBERO spatial dataset. When I performed inference, action decoding errors occurred. Below is an example.

Error decoding tokens: cannot reshape array of size 33 into shape (7)
Tokens: [271, 326, 340, 372, 271, 326, 1512, 1683, 297, 803, 333, 258]

I looked into the code and found this is because the coefficient matrix decoded by the BPE tokenizer does not match with the shape of (time_horizon, action_dim). I was using time_horzion=5 action_dim=7. So I would expect an array of size 35.

Did you encounter such problem? Since there is no guarantee on generating tokens that would result in a coefficient matrix which matches the desired shape, how would such problem be handled in practice? What was the time horizon used for LIBERO training in Fig. 6 of the paper?

Thanks!

Physical Intelligence org

Hi!
Indeed, when you train models to predict the action tokens, it's likely that early on they will predict incorrect output shapes. That's why the FAST tokenizer handles the decoding errors gracefully, ie just warns you it happened and then returns a default all-0 array.
In practice, models typically learn very quickly within a few thousand gradient steps to only produce outputs of the correct length, so the number of decoding errors should reduce quickly.
Time horizon for libero was 10.

Thanks for your response. I have trained the model for 40k steps. I believe it has converged. But I still occasionally get this error. Does it sound reasonable?

Sign up or log in to comment