Hugo Laurençon's picture

Hugo Laurençon

HugoLaurencon

·

HugoLaurencon

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Qwen2.5 Technical Report

upvoted a paper 2 days ago

Building and better understanding vision-language models: insights and future directions

new activity 2 days ago

HuggingFaceM4/idefics2-8b:Seems like the user prompt is ignored

View all activity

Articles

Docmatix - a huge dataset for Document Visual Question Answering

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Putting ethical principles at the core of research lifecycle

Organizations

HugoLaurencon's activity

upvoted 2 papers 2 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 6 days ago • 328

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 124

upvoted 2 papers 10 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 12 days ago • 131

Phi-4 Technical Report

Paper • 2412.08905 • Published 14 days ago • 92

upvoted a paper 16 days ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published 20 days ago • 10

upvoted a paper 20 days ago

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Paper • 2412.04280 • Published 20 days ago • 13

upvoted 3 papers about 1 month ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22 • 56

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21 • 43

Watermark Anything with Localized Messages

Paper • 2411.07231 • Published Nov 11 • 20

upvoted a paper about 2 months ago

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7 • 49

upvoted 5 papers 2 months ago

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24 • 40

WAFFLE: Multi-Modal Model for Automated Front-End Development

Paper • 2410.18362 • Published Oct 24 • 11

MoH: Multi-Head Attention as Mixture-of-Head Attention

Paper • 2410.11842 • Published Oct 15 • 20

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 89

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15 • 16

upvoted 5 papers 3 months ago

Diversity-Rewarded CFG Distillation

Paper • 2410.06084 • Published Oct 8 • 10

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2 • 26

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Paper • 2402.19474 • Published Feb 29 • 2

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published Sep 20 • 68

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19 • 47