physical-intelligence/fast · Action Decoding Errors

6 days ago

Hi, I performed training on the LIBERO spatial dataset. When I performed inference, action decoding errors occurred. Below is an example.

Error decoding tokens: cannot reshape array of size 33 into shape (7)
Tokens: [271, 326, 340, 372, 271, 326, 1512, 1683, 297, 803, 333, 258]

I looked into the code and found this is because the coefficient matrix decoded by the BPE tokenizer does not match with the shape of (time_horizon, action_dim). I was using time_horzion=5 action_dim=7. So I would expect an array of size 35.

Did you encounter such problem? Since there is no guarantee on generating tokens that would result in a coefficient matrix which matches the desired shape, how would such problem be handled in practice? What was the time horizon used for LIBERO training in Fig. 6 of the paper?

Thanks!

KarlP

Physical Intelligence org 6 days ago

Hi!
Indeed, when you train models to predict the action tokens, it's likely that early on they will predict incorrect output shapes. That's why the FAST tokenizer handles the decoding errors gracefully, ie just warns you it happened and then returns a default all-0 array.
In practice, models typically learn very quickly within a few thousand gradient steps to only produce outputs of the correct length, so the number of decoding errors should reduce quickly.
Time horizon for libero was 10.

kiddyna

6 days ago

•

edited 6 days ago

Thanks for your response. I have trained the model for 40k steps. I believe it has converged. But I still occasionally get this error. Does it sound reasonable?