Bils (Bilel Aroua)

The Stable Diffusion 3 research paper broken down, including some overlooked details! 📝

Model
📏 2 base model variants mentioned: 2B and 8B sizes

📐 New architecture in all abstraction levels:
- 🔽 UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention 👋
- 🆕 Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

📄 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

🗃️ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
🔁 A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
✅ State of the art in automated evals for composition and prompt understanding
✅ Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf

3 replies

·

reacted to MehdiLeZ's post with ❤️ 10 months ago

Post

Dear music lovers 🕺,

MusicLang Space is now live: musiclang/README

MusicLang is a controllable model for music generation:

> 🦙 Discover the LLAMA2 architecture, trained from scratch for symbolic music generation, ensuring exceptional quality;
> 👨‍🎨 Unleash your creativity by extending an existing music, or create new ones from scratch;
> 🤖 Integrate MusicLang into your applications, with an inference optimized for CPUs written in C, other integrations and optimizations coming soon.

In the space, you’ll find :

1️⃣ MusicLang foundation model: our fondation model for creating and generating original midi soundtracks musiclang/musiclang-v2;

2️⃣ MusicLang predict: our AI prediction api of the MusicLang package https://github.com/musiclang/musiclang_predict?tab=readme-ov-file;

3️⃣ MusicLang Language:a new language for tonal music. This language allows composers to load, write, transform and predict symbolic music in a simple, condensed and high level manner https://github.com/MusicLang/musiclang;

4️⃣ MusicLang Demo Space: musiclang/musiclang-predict

5️⃣ Our Colab: https://colab.research.google.com/drive/1MA2mek826c05BjbWk2nRkVv2rW7kIU_S?usp=sharing

Help us share the future of music composition! Spread the word, show your support by adding a star or contribute to our project. ⭐️✨

Music Sounds Definitely Better with You 🎶 🖤

cc @floriangardin @MehdiLeZ @reach-vb

Thanks a lot,

The MusicLang team ❤️

8 replies

·

reacted to trisfromgoogle's post with ❤️ 10 months ago

Post

I am thrilled to announce Gemma, new 2B and 7B models from Google, based on the same research and technology used to train the Gemini models! These models achieve state-of-the-art performance for their size, and are launched across Transformers, Google Cloud, and many other surfaces worldwide starting today.

Get started using and adapting Gemma in the model Collection: google/gemma-release-65d5efbccdbb8c4202ec078b

These launches are the product of an outstanding collaboration between the Google DeepMind and Hugging Face teams over the last few months -- very proud of the work both teams have done, from integration with Vertex AI to optimization across the stack. Read more about the partnership in the main launch by @philschmid @osanseviero @pcuenq on the launch blog: https://huggingface.co./blog/gemma

More information below if you are curious about training details, eval results, and safety characteristics!

Gemma Tech Report: https://goo.gle/GemmaReport
Launch announcement: https://blog.google/technology/developers/gemma-open-models/