arxiv:2412.02611
Shijia Yang
shijiay
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 1 month ago
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
Audio-Visual Information?
authored
a paper
about 1 month ago
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
Audio-Visual Information?
commented
a paper
4 months ago
Law of Vision Representation in MLLMs
Organizations
None yet
models
27
shijiay/llava_clip224_stage1
Image-Text-to-Text
•
Updated
•
18
shijiay/llava_clip224_stage2
Image-Text-to-Text
•
Updated
•
25
shijiay/llava_dinov2_stage2
Image-Text-to-Text
•
Updated
•
18
•
1
shijiay/llava_clip_stage1
Image-Text-to-Text
•
Updated
•
15
shijiay/llava_clip_stage2
Image-Text-to-Text
•
Updated
•
43
shijiay/llava_openclip_stage1
Image-Text-to-Text
•
Updated
•
9
shijiay/llava_openclip_stage2
Image-Text-to-Text
•
Updated
•
11
shijiay/llava_siglip_stage1
Image-Text-to-Text
•
Updated
•
15
shijiay/llava_siglip_stage2
Image-Text-to-Text
•
Updated
•
23
shijiay/llava_sdim_stage1
Image-Text-to-Text
•
Updated
•
7
datasets
None public yet