Incorrect comments in example
#4
by
mjspeck
- opened
Under FlavaForPreTraining
the value for outputs.multimodal_embeddings
is actually None
, as opposed to what the adjacent comment implies : # Batch size X (Number of image patches + Text Sequence Length + 3) X Hidden size => 2 X 275 x 768
. Why? Doesn't seem like the README
author expected this.
I assume it has to do with this: inputs.bool_masked_pos.zero_()