Hi there, I've been utilizing vlmevalkit to evaluate the post-training model. All other benchmarks seem to be in order, except for the MMStar's score which is on the lower side. Could you possibly shed some light on why this might be the case?
· Sign up or log in to comment