Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 4 days ago • 55
view article Article Perceiver IO: a scalable, fully-attentional model that works on any modality Dec 15, 2021 • 2
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Paper • 1908.10084 • Published Aug 27, 2019 • 4
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published 13 days ago • 43
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published 18 days ago • 53
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9 • 20
view article Article Selective fine-tuning of Language Models with Spectrum By anakin87 • 19 days ago • 26
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published 24 days ago • 50
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 109
view article Article Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap Jun 19 • 11
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Paper • 2404.03118 • Published Apr 3 • 23
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152
EVLM: An Efficient Vision-Language Model for Visual Understanding Paper • 2407.14177 • Published Jul 19 • 42
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 87
view article Article Build Agentic Workflow using OpenAGI and HuggingFace models By lucifertrj • Jun 26 • 8
view article Article Building a Vision Mixture-of-Expert Model from several fine-tuned Phi-3-Vision Models By mjbuehler • Jun 12 • 6
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
view article Article Extracting Concepts from LLMs: Anthropic’s recent discoveries 📖 By m-ric • Jun 20 • 26
view article Article seemore: Implement a Vision Language Model from Scratch By AviSoori1x • Jun 23 • 57
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community Apr 15 • 160
view article Article DS-MoE: Making MoE Models More Efficient and Less Memory-Intensive By bpan • Apr 9 • 29
A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data Paper • 2403.04866 • Published Mar 7 • 5
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression Paper • 2311.10794 • Published Nov 17, 2023 • 24