Cross's picture

Cross

dillfrescott

AI & ML interests

AI, anime, computers

Recent Activity

Organizations

The Waifu Research Department's profile picture

dillfrescott's activity

reacted to merve's post with πŸ‘πŸ‘€ about 7 hours ago
New activity in Qwen/QVQ-72B-Preview about 7 hours ago

GGUF weights?

3
#1 opened about 13 hours ago by
luijait
reacted to AdinaY's post with β€οΈπŸ‘€πŸ”₯ about 9 hours ago
view post
Post
442
QvQ-72B-PreviewπŸŽ„ an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
liked a Space about 9 hours ago
reacted to m-ric's post with πŸ‘πŸ”₯ 4 days ago
view post
Post
1613
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: π—ͺ𝗲𝗹𝗰𝗼𝗺𝗲 π— π—Όπ—±π—²π—Ώπ—»π—•π—˜π—₯𝗧! πŸ€—

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➑️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

π—§π—Ÿ;𝗗π—₯:
πŸ›οΈ Architecture changes:
β‡’ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

πŸ₯‡ As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post πŸ‘‰ https://huggingface.co./blog/modernbert
  • 1 reply
Β·
reacted to etemiz's post with πŸ‘πŸ‘€ 4 days ago
view post
Post
2261
As more synthetic datasets are made, we move slowly away from human alignment.
  • 4 replies
Β·
reacted to wenhuach's post with πŸ‘€ 6 days ago
view post
Post
1772
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.
  • 4 replies
Β·