This is a quantization of Yi-VL-34B and of the visual transformer.
You currently need to apply this PR to make it work: https://github.com/ggerganov/llama.cpp/pull/5093 - this adds the additional normalization steps into the projection
Yi-Vl-34B is prone to hallucinations, to me it appears like a rushed release. Something did not go right in training. However, while 6B was the 2nd worst llava-model I've tested, the 34B did show some strengths.