data-is-better-together-contributor (Data Is Better Together Contributor)

sayakpaul

posted an update 1 day ago

Post

2272

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

prithivMLmods

posted an update 3 days ago

Post

4845

Sketchify 😉🎨

+ strangerzonehf/Flux-Sketch-Smudge-LoRA
+ strangerzonehf/Flux-Sketch-Sized-LoRA
+ strangerzonehf/Sketch-Paint

- strangerzonehf/sketch-fav-675ba869c7ceaec7e652ee1c

davanstrien

posted an update 5 days ago

Post

1512

Introducing FineWeb-C 🌐🎓, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c

fdaudens

posted an update 6 days ago

Post

1136

🔍 From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.

Check it out: huggingface/open-source-ai-year-in-review-2024

prithivMLmods

posted an update 6 days ago

Post

2056

Qwen2VL Models: Vision and Language Processing 🍉

📍FT; [ Latex OCR, Math Parsing, Text Analogy OCRTest ]

❄️Demo : prithivMLmods/Qwen2-VL-2B . The demo includes the Qwen2VL 2B Base Model.

🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.

📄PDFs are rendered using the ReportLab software library toolkit.

🧵Models :
+ prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ prithivMLmods/Qwen2-VL-Ocrtest-2B-Instruct
+ prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct

🚀Sample Document :
+ https://drive.google.com/file/d/1Hfqqzq4Xc-3eTjbz-jcQY84V5E1YM71E/view?usp=sharing

📦Collection :
+ prithivMLmods/vision-language-models-67639f790e806e1f9799979f

.
.
.
@prithivMLmods 🤗

1 reply

·

burtenshaw

posted an update 6 days ago

Post

2539

People are flexing their end of year stats, so I made this app to show hub stats in a tidy design!

Thanks @Ameeeee and @jfcalvo for the feature from Argilla!
burtenshaw/recap

1 reply

·

davidberenstein1957

posted an update 6 days ago

Post

1264

🐇 Tumble down the AI rabbit hole without any technical knowledge!

Explore AI models on the Hub by a simple and quick search

Demo: davidberenstein1957/transformers-pipeline-playground

prithivMLmods

posted an update 7 days ago

Post

3167

🎄 Here Before - Xmas🎅✨

🧑🏻‍🎄Models
+ [ Xmas 2D Illustration ] : strangerzonehf/Flux-Xmas-Illustration-LoRA
+ [ Xmas 3D Art ] : strangerzonehf/Flux-Xmas-3D-LoRA
+ [ Xmas Chocolate ] : strangerzonehf/Flux-Xmas-Chocolate-LoRA
+ [ Xmas Isometric Kit ] : strangerzonehf/Flux-Xmas-Isometric-Kit-LoRA
+ [ Xmas Realpix ] : strangerzonehf/Flux-Xmas-Realpix-LoRA
+ [ Xmas Anime ] : strangerzonehf/Flux-Anime-Xmas-LoRA

❄️Collections
+ [ Xmas Art ] : strangerzonehf/christmas-pack-6758b199487adafaddb68f82
+ [ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-org-6737118adcf2cb40d66d0c7e

🥶Page
+ [ Stranger Zone ] : https://huggingface.co./strangerzonehf

.
.
.
@prithivMLmods 🤗

fdaudens

posted an update 7 days ago

Post

1132

🤝 Want to share your AI models while protecting your work? Licenses are key!

Fascinating to see that nearly 60% of models on the Hub use Apache & MIT licenses.

Explore the viz here: huggingface/open-source-ai-year-in-review-2024

AtAndDev

posted an update 7 days ago

Post

300

@s3nh Hey man check your discord! Got some news.

4 replies

·

sayakpaul

posted an update 7 days ago

Post

1549

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

·

fdaudens

posted an update 8 days ago

Post

1261

Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?

I used natural language processing to cluster and map them — really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.

Click any dot to explore the original prediction. What themes surprise/interest you the most?

👉 fdaudens/nieman_lab_2025_predictions_visualization

P.s.: I discovered that Nieman Lab's content is under Creative Commons license!

nataliaElv

posted an update 8 days ago

Post

1597

If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

https://www.youtube.com/watch?v=_-ORB4WAVGU

davidberenstein1957

posted an update 9 days ago

Post

4106

Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co./blog/synthetic-data-generator
Space: argilla/synthetic-data-generator

4 replies

·

fdaudens

posted an update 11 days ago

Post

647

The #NeurIPS2024 Class: Explore which are the leading research institutions 🎓🔬

huggingface/open-source-ai-year-in-review-2024

prithivMLmods

posted an update 12 days ago

Post

2659

strangerzonehf/Flux-Sketch-Flat-LoRA

shayekh

authored 2 papers 12 days ago

bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents

Paper • 2308.10647 • Published Aug 21, 2023

Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published 16 days ago • 25

alielfilali01

posted an update 12 days ago

Post

3301

Unpopular opinion: Open Source takes courage to do !

Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !

Cheers to the heroes here who see this!

3 replies

·

fdaudens

posted an update 13 days ago

Post

1529

Are you at #NeurIPS2024? Check out our cool data visualizations about research papers in the Year in Review!

huggingface/open-source-ai-year-in-review-2024

huggingface/open-source-ai-year-in-review-2024

Data Is Better Together Contributor

AI & ML interests

Recent Activity

data-is-better-together-contributor's activity

bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents

Maya: An Instruction Finetuned Multilingual Multimodal Model

AI & ML interests

Recent Activity

Team members 89

data-is-better-together-contributor's activity