66 59 149

Andres Marafioti

andito

AI & ML interests

Multimodal models, VLM and TTS

Recent Activity

upvoted an article 3 days ago

Replicating DeepSeek R1 for Information Extraction

liked a dataset 4 days ago

fixie-ai/gigaspeech

posted an update 4 days ago

Extremely bullish on @CohereForAI's Aya Vision (8B & 32B) - new SOTA open-weight VLMs - 8B wins up to 81% of the time in its class, better than Gemini Flash - 32B beats Llama 3.2 90B! - Covers 23 languages, excels in image captioning, VQA & more - Integrated on transformers from Day 0! Efficient multimodal models are here to stay!!🔥 Check out their blog! https://huggingface.co./blog/aya-vision

View all activity

Organizations

andito's activity

upvoted an article 3 days ago

Article

Replicating DeepSeek R1 for Information Extraction

•

Jan 31

• 38

upvoted an article 5 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

6 days ago

• 57

upvoted an article 17 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

18 days ago

• 196

upvoted a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 199

upvoted 3 articles about 1 month ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.14k

Article

Fixing Gradient Accumulation

Oct 16, 2024

• 50

Article

We now support VLMs in smolagents!

Jan 24

• 91

upvoted an article about 2 months ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 146

upvoted a collection about 2 months ago

SmolVLM 256M & 500M

Collection

Collection for models & demos for even smoller SmolVLM release • 12 items • Updated 17 days ago • 70

upvoted a paper about 2 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 136

upvoted an article about 2 months ago

Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 31

upvoted 2 papers 3 months ago

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Paper • 2412.10302 • Published Dec 13, 2024 • 17

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 140

upvoted a collection 3 months ago

Nov 29 Releases 🌲🌲

Collection

25 items • Updated Dec 2, 2024 • 10

upvoted 3 articles 5 months ago

Article

Llama 3.2 in Keras

Oct 21, 2024

• 12

Article

Welcome, Gradio 5

Oct 9, 2024

• 128

Article

Tool Use, Unified

Aug 12, 2024

• 89

upvoted 3 articles 6 months ago

Article

FineVideo: behind the scenes

Sep 23, 2024

• 29

Article

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

May 1, 2024

• 73

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 225