Corwin Black

Mescalamba
Β·

AI & ML interests

image denoising

Recent Activity

replied to m-ric's post 5 days ago
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: π—ͺ𝗲𝗹𝗰𝗼𝗺𝗲 π— π—Όπ—±π—²π—Ώπ—»π—•π—˜π—₯𝗧! πŸ€— We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models. The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs). It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs. Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub. ➑️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT. π—§π—Ÿ;𝗗π—₯: πŸ›οΈ Architecture changes: β‡’ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 ✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead. πŸ₯‡ As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster! Read the blog post πŸ‘‰ https://huggingface.co./blog/modernbert
View all activity

Organizations

None yet

Mescalamba's activity

Very good!

#1 opened 11 days ago by
Mescalamba
New activity in Djrango/Qwen2vl-Flux 29 days ago
New activity in migtissera/Llama-3-8B-Synthia-v3.5 about 1 month ago

Its really good but..

#3 opened about 1 month ago by
Mescalamba
New activity in city96/flux.1-lite-8B-alpha-gguf 2 months ago

need fp8 for speed

3
#1 opened 2 months ago by
Ai11Ali