What is the significance of the parameters input_size of max_num?
How does changing them affect the output or accuracy? I read the report for VL1.5 but it just mentioned that it was possible to change them and that in training max_num was 12, not necessarily what the difference was.
Hello, this parameter will change the resolution of the input image, so it will change the accuracy.
Right but how? I've tried changing input_size any everything other than 448 crashed the code. Changing max_num seemed to do nothing. Is bigger higher resolution or is it small max_num for higher res?
I found that loading in 8 bit actually has higher performance on my vqa task so I think something is wrong with what I'm doing...
input_size
represents the size of each image tile, which is 448 and cannot be modified. You can control the resolution of the input image by adjusting max_num
; the larger the max_num
, the larger the input image will be.
So to make sure I understand it, if max_num is 1 my image gets resized to 448x448 and is only a single tile. If my max_num is like 16 it gets turned into at most 16 448x448 images, depending on the original size?
So to make sure I understand it, if max_num is 1 my image gets resized to 448x448 and is only a single tile. If my max_num is like 16 it gets turned into at most 16 448x448 images, depending on the original size?
yes