sllhf (SLLHF)

m-ric

posted an update 6 days ago

Post

1627

After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

𝗧𝗟;𝗗𝗥:
🏛️ Architecture changes:
⇒ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

🥇 As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post 👉 https://huggingface.co./blog/modernbert

1 reply

·

m-ric

posted an update 6 days ago

Post

2024

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

·

regisss

posted an update 7 days ago

Post

867

Nice to see day 1 support of Falcon 3 on Gaudi with Optimum Habana!

👉 https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-support-falcon-3-fdn-models.html

Xenova

posted an update 7 days ago

Post

1894

Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
🚀 Faster and more accurate than Whisper
🔒 Privacy-focused (no data leaves your device)
⚡️ WebGPU accelerated (w/ WASM fallback)
🔥 Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web

m-ric

posted an update 12 days ago

Post

2153

𝗣𝗼𝘁𝗲𝗻𝘁𝗶𝗮𝗹 𝗽𝗮𝗿𝗮𝗱𝗶𝗴𝗺 𝘀𝗵𝗶𝗳𝘁 𝗶𝗻 𝗟𝗟𝗠𝘀: 𝗻𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗯𝘆 𝗠𝗲𝘁𝗮 𝗰𝗹𝗮𝗶𝗺𝘀 𝘁𝗵𝗮𝘁 𝘄𝗲 𝗰𝗮𝗻 𝗴𝗲𝘁 𝗿𝗶𝗱 𝗼𝗳 𝘁𝗼𝗸𝗲𝗻𝗶𝘇𝗲𝗿𝘀! 🥳

Current LLMs process text by first splitting it into tokens. They use a module named "tokenizer", that -spl-it-s- th-e- te-xt- in-to- arbitrary tokens depending on a fixed dictionnary.
On the Hub you can find this dictionary in a model's files under tokenizer.json.

➡️ This process is called BPE tokenization. It is suboptimal, everyone says it. It breaks text into predefined chunks that often fail to capture the nuance of language. But it has been a necessary evil in language models since their inception.

💥 In Byte Latent Transformer (BLT), Meta researchers propose an elegant solution by eliminating tokenization entirely, working directly with raw bytes while maintaining efficiency through dynamic "patches."

This had been tried before with different byte-level tokenizations, but it's the first time that an architecture of this type scales as well as BPE tokenization. And it could mean a real paradigm shift! 👏👏

🏗️ 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲:
Instead of a lightweight tokenizer, BLT has a lightweight encoder that process raw bytes into patches. Then the patches are processed by the main heavy-duty transformers as we do normally (but for patches of bytes instead of tokens), before converting back to bytes.

🧩 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴:
Instead of fixed tokens, BLT groups bytes based on their predictability (measured by entropy) - using more compute for complex sequences and efficiently handling simple ones. This allows efficient processing while maintaining byte-level understanding.

I hope this breakthrough is confirmed and we can get rid of all the tokenizer stuff, it will make model handling easier!

Read their paper here 👉 https://dl.fbaipublicfiles.com/blt/BLT__Patches_Scale_Better_Than_Tokens.pdf

2 replies

·

m-ric

posted an update 14 days ago

Post

2415

💥 𝗚𝗼𝗼𝗴𝗹𝗲 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝘀 𝗚𝗲𝗺𝗶𝗻𝗶 𝟮.𝟬, 𝘀𝘁𝗮𝗿𝘁𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗮 𝗙𝗹𝗮𝘀𝗵 𝗺𝗼𝗱𝗲𝗹 𝘁𝗵𝗮𝘁 𝘀𝘁𝗲𝗮𝗺𝗿𝗼𝗹𝗹𝘀 𝗚𝗣𝗧-𝟰𝗼 𝗮𝗻𝗱 𝗖𝗹𝗮𝘂𝗱𝗲-𝟯.𝟲 𝗦𝗼𝗻𝗻𝗲𝘁! And they start a huge effort on agentic capabilities.

🚀 The performance improvements are crazy for such a fast model:
‣ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed
‣ Now supports both input AND output of images, video, audio and text
‣ Can natively use tools like Google Search and execute code

➡️ If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks 🤯

🤖 What about the agentic capabilities?

‣ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps
‣ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!)
‣ Jules: An AI coding agent that integrates with GitHub workflows

I'll be eagerly awaiting further news from Google!

Read their blogpost here 👉 https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/

m-ric

posted an update 14 days ago

Post

1790

𝐒𝐜𝐚𝐥𝐢𝐧𝐠 𝐥𝐚𝐰𝐬 𝐚𝐫𝐞 𝐧𝐨𝐭 𝐝𝐞𝐚𝐝 𝐲𝐞𝐭! New blog post suggests Anthropic might have an extremely strong Opus-3.5 already available, but is not releasing it to keep their edge over the competition. 🧐

❓Since the release of Opus-3.5 has been delayed indefinitely, there have been lots of rumors and articles about LLMs plateauing. Scaling laws, the main powering factor of the LLM competence increase, could have stopped, according to these rumors, being the cause of this stalling of progress.

These rumors were quickly denied by many people at the leading LLM labs, including OpenAI and Anthropic. But these people would be expected to hype the future of LLMs even if scaling laws really plateaued, so the jury is still out.

🗞️ This new article by Semianalysis (generally a good source, specifically on hardware) provides a counter-rumor that I find more convincing:

➡️ Maybe scaling laws still work, Opus-3.5 is ready and as good as planned, but they just don't release it because the synthetic data it helps provide can bring cheaper/smaller models Claude and Haiku up in performance, without risking to leak this precious high-quality synthetic data to competitors.

Time will tell! I feel like we'll know more soon.

Read the article: https://semianalysis.com/2024/12/11/scaling-laws-o1-pro-architecture-reasoning-infrastructure-orion-and-claude-3-5-opus-failures/

1 reply

·

m-ric

posted an update 16 days ago

Post

2220

Last week was crazy in OS AI, with important models and datasets releases every day.

Here are the most important ones I've pinned:

🌎 Cohere relased GLobal-MMLU, a multilingual version of MMLU, to evaluate AI models' world knowledge in many languages!

🦙 Meta released Llama-3.3-70B-Instruct, a 70B model that's on par with Llama-3.1-405B-Instruct, GPT-4o and Claude. Probably my new go-to for agentic workflows.

🔉 FishAudio released fish-speech-1.5, multilingual text to speech model

🎨 Microsoft Research released TRELLIS, an extremely impressive image-to-3D model, which you can try here: JeffreyXiang/TRELLIS

📚 Yesterday, Hugging Face release FineWeb 2, a new version that extends the previous FineWeb to over 1000 languages, including extended coverage in Russina, Mandarin, German, Japanese, Spanish, French, so a huge, high-quality dataset of > 3 trillion words! HuggingFaceFW/fineweb-2

Now let's go build to make this week as productive as last one!

Xenova

posted an update 17 days ago

Post

2518

Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! 🔥 High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. 🤗 Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)

reach-vb

posted an update 18 days ago

Post

3202

VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co./collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co./jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co./vikhyatk/

What a time to be alive! 🔥

dvilasuero

authored a paper 19 days ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published 21 days ago • 17

dvilasuero

posted an update 19 days ago

Post

2259

🌐 Announcing Global-MMLU: an improved MMLU Open dataset with evaluation coverage across 42 languages, built with Argilla and the Hugging Face community.

Global-MMLU is the result of months of work with the goal of advancing Multilingual LLM evaluation. It's been an amazing open science effort with collaborators from Cohere For AI, Mila - Quebec Artificial Intelligence Institute, EPFL, Massachusetts Institute of Technology, AI Singapore, National University of Singapore, KAIST, Instituto Superior Técnico, Carnegie Mellon University, CONICET, and University of Buenos Aires.

🏷️ +200 contributors used Argilla MMLU questions where regional, dialect, or cultural knowledge was required to answer correctly. 85% of the questions required Western-centric knowledge!

Thanks to this annotation process, the open dataset contains two subsets:

1. 🗽 Culturally Agnostic: no specific regional, cultural knowledge is required.
2. ⚖️ Culturally Sensitive: requires dialect, cultural knowledge or geographic knowledge to answer correctly.

Moreover, we provide high quality translations of 25 out of 42 languages, thanks again to the community and professional annotators leveraging Argilla on the Hub.

I hope this will ensure a better understanding of the limitations and challenges for making open AI useful for many languages.

Dataset: CohereForAI/Global-MMLU

m-ric

posted an update 21 days ago

Post

1467

𝗦𝗵𝗼𝘄𝗨𝗜: 𝗮 𝘀𝗺𝗮𝗹𝗹 𝗲𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝗮𝗴𝗲𝗻𝘁 𝘁𝗵𝗮𝘁 𝗰𝗮𝗻 𝗻𝗮𝘃𝗶𝗴𝗮𝘁𝗲 𝗮𝗻𝘆 𝗨𝗜 𝗮𝗻𝗱 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗺𝘂𝗰𝗵 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝘆𝘀𝘁𝗲𝗺𝘀! 📲

A team from NUS and Microsoft just released an agent that can act on any UI (Desktop, Android, Web) without needing additional text information. It works extremely well : they applied their method on a tiny Qwen2-VL-2B, and they managed to beat methods that use either much more powerful vision models (like GPT-4V) without using any additional info (e.g. leveraging the DOM of a webpage) like previous methods did ! 👏👏

They started from the idea that most existing methods rely heavily on text, which makes them less generalizable, while letting aside rich UI structure that user actually rely on when navigating this interfaces.

⚙️ They put several good ideas to work:

💡 Simplify screenshots to the max:
They prune a lot the heavy visual content of UI screenshots, by removing cloned image patches (like any vast patch of the same color will be reduced to a small patch, while maintaining positional embeddings), then group patches from the same GUI elements together to simplify even further

💡 Build a truly generalist dataset:
To train a general UI agent, you need trajectories from each possible UI, and express them in a common language. Authors merge datasets like OmniAct for Desktop, Mind2Web for websites, AMEX for Android trajectories to create a high-quality and diverse dataset.

➡️ Nice results ensued:
They fine-tune a tiny Qwen-2-VL-2B on their method, and it reaches SOTA on several task (element identification, web navigation), even beating methods that either use additional info from the DOM or use much bigger VLMS like GPT-4v! 🏆

And performance could certainly jump with a slightly bigger vision model. Let's hope the community builds this soon! 🚀

Paper added to my "Agents" collection 👉 m-ric/agents-65ba776fbd9e29f771c07d4e

m-ric

posted an update 22 days ago

Post

1206

Need a measurement for traction of a GitHub repo, a more reliable one than Github star history? (which is a bit too hype-driven) 📈

➡️ I've made a Space to visualize PyPI downloads.

Try it here 👉 m-ric/package-download-history

1 reply

·

m-ric

posted an update 23 days ago

Post

1269

🤖 𝗔𝗱𝗼𝗯𝗲'𝘀 𝗰𝗼𝗱𝗲-𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝘁𝗵𝗲 𝘁𝗼𝗽 𝗼𝗳 𝗚𝗔𝗜𝗔 𝗹𝗲𝗮𝗱𝗲𝗿𝗯𝗼𝗮𝗿𝗱 - and their paper cites my work!

💡 Reminder: In short, Agentic systems are a vehicle in which you put your LLM to allow it access to the outside world.

➡️ The team of researchers at Adobe started from the idea that current agentic systems lack the ability to define their own tools. So they decided to make an agent that writes actions as code, thus allowing it to write python functions that can be re-used later as tools!

Here's what the LLM generations can look like with the proper prompt:

Thought: I need to access the excel file using a different method.
Action:

def access_excel_file(file_path)
	... # rest of the code (the agent does writes it, but I don't have room in this post)
	return rows

Then your system executes this and appends the observation to the agent's memory.

Why is this code formulation better than classical tool use formulation as JSON? The paper explains:

"Most existing work uses text or JSON as the representation of actions, which significantly lacks the two criteria mentioned earlier: generality and composability. In contrast, DynaSaur can utilize available actions or create new ones if necessary, using code as a unified representation. In principle, acting with code enables agents to solve any Turing-complete problem."

The idea of using code is not new: in fact, we do it in transformers.agents (thus the citation that I got). They implementation adds further refinements, like using RAG to retrieve relevant functions before generating an action, which increases performance further.

And they observe that code agents perform much better, reaching the top of GAIA leaderboard! 🥇

Go take a look, it's really clear and informative!

Paper added to my agents collection 👉 m-ric/agents-65ba776fbd9e29f771c07d4e

m-ric

posted an update 26 days ago

Post

2376

Single most important thing to do today: 𝗴𝗼 𝘁𝗿𝘆 𝗤𝘄𝗤 𝗼𝗻 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗖𝗵𝗮𝘁!

👉 https://huggingface.co./chat/models/Qwen/QwQ-32B-Preview

2 replies

·

victor

posted an update 26 days ago

Post

1783

Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing 🤯.

Check this dataset for the results: victor/qwq-misguided-attention

2 replies

·

Xenova

posted an update 27 days ago

Post

3917

We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🤯 Let's take a look:
🔀 Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
👁️ Qwen2-VL from Qwen for dynamic-resolution image understanding
🔢 JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
🌋 LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
🤸‍♀️ ViTPose for pose estimation
📄 MGP-STR for optical character recognition (OCR)
📈 PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! 🔥 Huge for privacy!

Check out the release notes for more information. 👇
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU

m-ric

posted an update about 1 month ago

Post

1079

🗞️ 𝗦𝘁𝗮𝘁𝗲 𝗼𝗳 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗔𝗜 𝟮𝟬𝟮𝟰: 𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗲𝗮𝘁𝗶𝗻𝗴 𝘂𝗽 𝗢𝗽𝗲𝗻𝗔𝗜, 𝗔𝗴𝗲𝗻𝘁𝘀 𝗿𝗮𝗺𝗽 𝘂𝗽 𝘁𝗼 𝟭𝟮% 𝗼𝗳 𝘂𝘀𝗲-𝗰𝗮𝘀𝗲𝘀, 𝗼𝗽𝗲𝗻 𝗺𝗼𝗱𝗲𝗹𝘀 𝗺𝗮𝗸𝗲 𝟭𝟵% 𝗼𝗳 𝘂𝘀𝗮𝗴𝗲

Menlo Ventures surveyed 600 enterprise IT decision-makers for their 2024 report. They reveal that AI spending surged to $13.8 billion this year, more than 6x the $2.3 billion spent in 2023!

Companies are shifting from experimentation to serious implementation.

👷 Top enterprise use cases by adoption:
‣ Code copilots (51%)
- GitHub Copilot hit $300M revenue run rate
‣ Support chatbots (31%)
‣ RAG (28%)
‣ Data extraction/transformation (27%)
‣ Meeting summarization (25%)

📈 Market dynamics:
‣ OpenAI's enterprise share dropped from 50% to 34% 👎
‣ Anthropic doubled presence from 12% to 24% 🚀
‣ Open-source makes up 19% of usage 🤗

😬 Implementation challenges:
‣ 26% failed due to unexpected implementation costs
‣ 21% failed due to data privacy issues
‣ 18% failed due to disappointing ROI
‣ 15% failed due to hallucinations

Read the full report here 👉 https://menlovc.com/2024-the-state-of-generative-ai-in-the-enterprise/

victor

posted an update about 1 month ago

Post

2255

Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer 🔥
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights 🚀.

SLLHF

AI & ML interests

Recent Activity

sllhf's activity

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

AI & ML interests

Recent Activity

Team members 136

sllhf's activity