116 5 1426

Cross

dillfrescott

https://dill.moe

dillfrescott

AI & ML interests

AI, anime, computers

Recent Activity

reacted to merve's post with 👍 about 7 hours ago

QwQ can see 🔥 Qwen team released QvQ, a large vision LM with reasoning 😱 it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo! Check them out ⬇️ Demo https://huggingface.co./spaces/Qwen/QVQ-72B-preview Model https://huggingface.co./Qwen/QVQ-72B-Preview Read more https://qwenlm.github.io/blog/qvq-72b-preview/ Congratulations @JustinLin610 and team!

reacted to merve's post with 👀 about 7 hours ago

new activity about 7 hours ago

Qwen/QVQ-72B-Preview:GGUF weights?

View all activity

Organizations

dillfrescott's activity

reacted to merve's post with 👍👀 about 7 hours ago

Post

476

QwQ can see 🔥
Qwen team released QvQ, a large vision LM with reasoning 😱

it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo!
Check them out ⬇️
Demo Qwen/QVQ-72B-preview
Model Qwen/QVQ-72B-Preview
Read more https://qwenlm.github.io/blog/qvq-72b-preview/
Congratulations @JustinLin610 and team!

New activity in Qwen/QVQ-72B-Preview about 7 hours ago

GGUF weights?

#1 opened about 13 hours ago by

luijait

reacted to AdinaY's post with ❤️👀🔥 about 9 hours ago

Post

442

QvQ-72B-Preview🎄 an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
✨ Combines visual understanding & language reasoning.
✨ Scores 70.3 on MMMU
✨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving

liked a model about 9 hours ago

bartowski/QVQ-72B-Preview-GGUF

Image-Text-to-Text • Updated about 1 hour ago • 13

liked a Space about 9 hours ago

Running

🌍

QVQ 72B Preview

liked a model about 13 hours ago

Qwen/QVQ-72B-Preview

Image-Text-to-Text • Updated about 6 hours ago • 166

liked a model 1 day ago

ibm-granite/granite-3.1-8b-instruct

Text Generation • Updated 5 days ago • 3.67k • 73

New activity in fishaudio/fish-speech-1.5 3 days ago

That is not free software, as you forbid commercial use

#11 opened 4 days ago by

JLouisBiz

reacted to m-ric's post with 👍🔥 4 days ago

Post

1613

After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗

We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

𝗧𝗟;𝗗𝗥:
🏛️ Architecture changes:
⇒ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

🥇 As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post 👉 https://huggingface.co./blog/modernbert

1 reply

liked 2 models 4 days ago

answerdotai/ModernBERT-large

Fill-Mask • Updated 5 days ago • 6.79k • 215

answerdotai/ModernBERT-base

Fill-Mask • Updated 5 days ago • 20.5k • 406

reacted to etemiz's post with 👍👀 4 days ago

Post

2261

As more synthetic datasets are made, we move slowly away from human alignment.

4 replies

liked 2 models 6 days ago

BlinkDL/rwkv-6-world

Text Generation • Updated Nov 13 • 141

microsoft/VidTok

Updated 32 minutes ago • 11

reacted to wenhuach's post with 👀 6 days ago

Post

1772

AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.

4 replies