fffiloni (Sylvain Filoni)

reacted to onekq's post with 🔥 5 days ago

Post

2576

This is historical. 🎉

DeepSeek 🐋R1🐋 surpassed OpenAI 🍓o1🍓 on the dual leaderboard. What a year for the open source!

onekq-ai/WebApp1K-models-leaderboard

reacted to thomwolf's post with 🔥 about 2 months ago

Post

5012

We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive 📜 ODC-By 1.0 license, and the 💻 code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a 📝 blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi

2 replies

·

reacted to MonsterMMORPG's post with 👀 about 2 months ago

Post

2789

FLUX Tools Complete Tutorial with SwarmUI (as easy as Automatic1111 or Forge) : Outpainting, Inpainting, Redux Style Transfer + Re-Imagine + Combine Multiple Images, Depth and Canny - More info at the oldest comment - No-paywall : https://youtu.be/hewDdVJEqOQ

FLUX.1 Tools by BlackForestLabs changed the #AI field forever. They became the number 1 Open Source community provider after this massive release. In this tutorial, I will show you step by step how use FLUX.1 Fill model (inpainting model) to do perfect outpainting (yes this model used for outpainting) and inpainting. Moreover, I will show all features of FLUX Redux model to do style transfer / re-imagine 1 and more than 1 images combination. Furthermore, I will show you step by step how to convert input image into Depth or Canny maps and then how to use them on #FLUX Depth and Canny models. Both LoRA and full checkpoints of FLUX Depth and Canny.

🔗 Full Instructions, Configs, Installers, Information and Links Shared Post (the one used in the tutorial - Public - no-paywall) ⤵️
▶️ https://www.patreon.com/posts/tutorial-instructions-links-public-post-106135985

Preparation of this tutorial took more than 1 week and this will be the very best and easiest to follow tutorial since it is made with famous #SwarmUI. SwarmUI is as easy and as advanced as Automatic1111 SD Web UI. Biggest advantage of SwarmUI is that, it uses ComfyUI as a back-end. Therefore, It is extremely fast, VRAM optimized and supports all of the newest SOTA models as soon as they are published.

So in this tutorial I will show you how to setup SwarmUI and FLUX Dev tools on your Windows Computer, Massed Compute, RunPod and Kaggle. I will step by step explanatin and show you every tips and tricks that you need to properly do style transfer, re-imagine, inpaint, outpaint, depth and canny with FLUX.

Video Chapters first image

reacted to m-ric's post with ❤️ 2 months ago

Post

3870

𝗧𝗵𝗲 𝗻𝗲𝘅𝘁 𝗯𝗶𝗴 𝘀𝗼𝗰𝗶𝗮𝗹 𝗻𝗲𝘁𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘁 🦋, 𝗶𝘁'𝘀 𝗛𝘂𝗯 𝗣𝗼𝘀𝘁𝘀! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

⚙️ Computed with the great dataset maxiw/hf-posts
⚙️ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me

9 replies

·

posted an update 2 months ago

Post

12745

DimensionX is out for you to try and duplicate 🤗
—> fffiloni/DimensionX

Discuss Paper: DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion (2411.04928)

Examples by the amazing William Lamkin @phanes

4 replies

·

reacted to abhishek's post with 🔥 3 months ago

Post

5783

INTRODUCING Hugging Face AutoTrain Client 🔥
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks 🤗

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced

6 replies

·

reacted to MoritzLaurer's post with 🚀🤗 4 months ago

Post

4621

#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D

4 replies

·

posted an update 4 months ago

Post

19429

Visionary Walter Murch (editor for Francis Ford Coppola), in 1999:

“ So let's suppose a technical apotheosis some time in the middle of the 21st century, when it somehow becomes possible for one person to make an entire feature film, with virtual actors. Would this be a good thing?

If the history of oil painting is any guide, the broadest answer would be yes, with the obvious caution to keep a wary eye on the destabilizing effect of following too intently a hermetically personal vision. One need only look at the unraveling of painting or classical music in the 20th century to see the risks.

Let's go even further, and force the issue to its ultimate conclusion by supposing the diabolical invention of a black box that could directly convert a single person's thoughts into a viewable cinematic reality. You would attach a series of electrodes to various points on your skull and simply think the film into existence.

And since we are time-traveling, let us present this hypothetical invention as a Faustian bargain to the future filmmakers of the 21st century. If this box were offered by some mysterious cloaked figure in exchange for your eternal soul, would you take it?

The kind of filmmakers who would accept, even leap, at the offer are driven by the desire to see their own vision on screen in as pure a form as possible. They accept present levels of collaboration as the evil necessary to achieve this vision. Alfred Hitchcock, I imagine, would be one of them, judging from his description of the creative process: "The film is already made in my head before we start shooting."”
—
Read "A Digital Cinema of the Mind? Could Be" by Walter Murch: https://archive.nytimes.com/www.nytimes.com/library/film/050299future-film.html

1 reply

·

reacted to singhsidhukuldeep's post with 🔥 4 months ago

Post

2591

Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.

Even more interesting is how they trained this cutting-edge model:

1️⃣ Architecture:
Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.

2️⃣ Training Pipeline:
• Started with pretrained Llama 3.1 text models
• Added image adapters and encoders
• Pretrained on large-scale noisy (image, text) pair data
• Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs

3️⃣ Vision Integration:
• Trained adapter weights to integrate a pre-trained image encoder
• Used cross-attention layers to feed image representations into the language model
• Preserved text-only capabilities by not updating language model parameters during adapter training

4️⃣ Post-Training Alignment:
• Multiple rounds of supervised fine-tuning (SFT)
• Rejection sampling (RS)
• Direct preference optimization (DPO)
• Synthetic data generation using Llama 3.1 for Q&A augmentation
• Reward model ranking for high-quality fine-tuning data

5️⃣ Lightweight Models:
• Used pruning and distillation techniques for 1B and 3B models
• Structured pruning from Llama 3.1 8B model
• Knowledge distillation using Llama 3.1 8B and 70B as teachers

6️⃣ Context Length:
All models support an impressive 128K token context length.

7️⃣ Safety Measures:
Incorporated safety mitigation data to balance helpfulness and safety.

The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.

What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!

reacted to jsulz's post with 🚀 4 months ago

Post

2078

In August, the XetHub team joined Hugging Face
- https://huggingface.co./blog/xethub-joins-hf - and we’ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248

reacted to asoria's post with 👍 4 months ago

Post

2553

📝 I wrote a tutorial on how to get started with the fine-tuning process using Hugging Face tools, providing an end-to-end workflow.

The tutorial covers creating a new dataset using the new SQL Console 🛢 and fine-tuning a model with SFT, guided by the Notebook Creator App 📙.

👉 You can read the full article here:
https://huggingface.co./blog/asoria/easy-fine-tuning-with-hf
asoria/auto-notebook-creator

reacted to fdaudens's post with 👍 4 months ago

Post

1251

🚀 Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.

Sharing these new additions with the links in case it’s helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets

There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!

You can also add your favorite ones if you're part of the community!

Check it out: https://huggingface.co./JournalistsonHF

#AIforJournalism #HuggingFace #OpenSourceAI

reacted to davanstrien's post with 👍 7 months ago

Post

2181

Search Hugging Face datasets by column names with a new experimental API! This API allows you to:

- Search for question-answering datasets that include context
- Find alpaca-style datasets
- Locate DPO datasets

Try it out here: librarian-bots/dataset-column-search-api, or explore real-world applications in this notebook: librarian-bots/dataset-column-search-api

reacted to alvdansen's post with 👍 7 months ago

Post

5849

New LoRA Model!

I trained this model on a new spot I'm really excited to share (soon!)

This Monday I will be posting my first beginning to end blog showing the tool I've used, dataset, captioning techniques, and parameters to finetune this LoRA.

For now, check out the model in the link below.

alvdansen/m3lt

5 replies

·

reacted to DmitryRyumin's post with 🔥 7 months ago

Post

3667

🚀🎭🌟 New Research Alert - Portrait4D-v2 (Avatars Collection)! 🌟🎭🚀
📄 Title: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer 🔝

📝 Description: Portrait4D-v2 is a novel method for one-shot 4D head avatar synthesis using pseudo multi-view videos and a vision transformer backbone, achieving superior performance without relying on 3DMM reconstruction.

👥 Authors: Yu Deng, Duomin Wang, and Baoyuan Wang

📄 Paper: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer (2403.13570)

🌐 GitHub Page: https://yudeng.github.io/Portrait4D-v2/
📁 Repository: https://github.com/YuDeng/Portrait-4D

📺 Video: https://www.youtube.com/watch?v=5YJY6-wcOJo

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: Portrait4D #4DAvatar #HeadSynthesis #3DModeling #TechInnovation #DeepLearning #ComputerGraphics #ComputerVision #Innovation

1 reply

·

reacted to alvdansen's post with 🚀 7 months ago

Post

2574

Per popular request, I'm working on a beginning to end LoRA training workflow blog for a style.

It will focus on dataset curation through training on a pre-determined style to give a better insight on my process.

Curious what are some questions you might have that I can try to answer in it?

reacted to louisbrulenaudet's post with 👍 7 months ago

Post

3251

I am delighted to announce the publication of my LegalKit, a French labeled dataset built for legal ML training 🤗

This dataset comprises multiple query-document pairs (+50k) curated for training sentence embedding models within the domain of French law.

The labeling process follows a systematic approach to ensure consistency and relevance:
- Initial Query Generation: Three instances of the LLaMA-3-70B model independently generate three different queries based on the same document.
- Selection of Optimal Query: A fourth instance of the LLaMA-3-70B model, using a dedicated selection prompt, evaluates the generated queries and selects the most suitable one.
- Final Label Assignment: The chosen query is used to label the document, aiming to ensure that the label accurately reflects the content and context of the original text.

Dataset: louisbrulenaudet/legalkit

Stay tuned for further updates and release information 🔥

@clem , if we can create an "HF for Legal" organization, similar to what exists for journalists, I am available!

Note : My special thanks to @alvdansen for their illustration models ❤️

2 replies

·

reacted to fdaudens's post with 🚀 7 months ago

Post

3429

Updated the Journalists on 🤗 community page:
- new text-to-speech tools collection JournalistsonHF/text-to-speech-6675c4dccdaa11e86928a15b
- additional leaderboards in the eval collection: TTS-AGI/TTS-Arena and dylanebert/3d-arena
- new tools in the Text-Analysis collection: gokaygokay/Florence-2, pdf2dataset/pdf2dataset, cvachet/pdf-chatbot
- Xenova/realtime-whisper-webgpu in the Transcription collection
- radames/flash-sd3-taesd3 in the Image Tools collection
- Last but not least, okaris/omni-zero in the fun collection for zero-shot stylized portrait creation

Is there any tool you would like to see added?

Find all the curated tools here: https://huggingface.co./collections/JournalistsonHF/

reacted to alvdansen's post with ❤️ 7 months ago

Post

6875

I had a backlog of LoRA model weights for SDXL that I decided to prioritize this weekend and publish. I know many are using SD3 right now, however if you have the time to try them, I hope you enjoy them.

I intend to start writing more fully on the thought process behind my approach to curating and training style and subject finetuning, beginning this next week.

Thank you for reading this post! You can find the models on my page and I'll drop a few previews here.

4 replies

·

Sylvain Filoni

AI & ML interests

Recent Activity

Articles

Breaking Barriers: The Critical Role of Art and Design in Advancing AI Capabilities

Organizations

fffiloni's activity