Andrea Soria

asoria

AI & ML interests

Maintainer of πŸ€—Datasets: Data processing

Articles

Organizations

asoria's activity

upvoted an article 21 days ago
view article
Article

LoRA training scripts of the world, unite!

β€’ 43
upvoted an article 26 days ago
upvoted an article 27 days ago
view article
Article

Improving Parquet Dedupe on Hugging Face Hub

β€’ 29
upvoted 5 articles about 2 months ago
view article
Article

Introducing BERTopic Integration with Hugging Face Hub

β€’ 7
view article
Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

β€’ 165
view article
Article

Introducing the SQL Console on Datasets

β€’ 18
view article
Article

Fine-Tuning Gemma Models in Hugging Face

β€’ 23
upvoted 2 articles 3 months ago
view article
Article

The 5 Most Under-Rated Tools on Hugging Face

β€’ 85
view article
Article

SmolLM - blazingly fast and remarkably powerful

β€’ 259
upvoted 5 articles 4 months ago
view article
Article

Docmatix - a huge dataset for Document Visual Question Answering

β€’ 66
view article
Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

β€’ 66
view article
Article

Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality

β€’ 32
view article
Article

Experimenting with Automatic PII Detection on the Hub using Presidio

β€’ 24
view article
Article

Announcing New Dataset Search Features

β€’ 22
upvoted an article 5 months ago
view article
Article

How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o

By chilijung β€’
β€’ 11
upvoted 3 articles 6 months ago
view article
Article

Synthetic dataset generation techniques: generating custom sentence similarity data

By davanstrien β€’
β€’ 15
view article
Article

Synthetic data: save money, time and carbon with open source

β€’ 50
view article
Article

πŸ¦™βš—οΈ Using Llama3 and distilabel to build fine-tuning datasets

By dvilasuero β€’
β€’ 72
upvoted an article 7 months ago
view article
Article

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

β€’ 23