Wikimedia Enterprise just dropped full English & French Wikipedia on Hugging Face as structured JSON π€―
Key points: 1. Parsed articles ready for machine learning pipelines 2. Perfect for AI model development - from pre-training to RAG 3. Includes metadata, Wikidata links, and content scores 4. Licensed under GFDL and CC BY-SA 4.0 (some content may have additional terms)
I've been testing it, and it's a game-changer. The structured format is like a supercharged version of raw Wiki dumps.
Thoughts on potential applications? I'm particularly interested in how this could improve AI language models' factual accuracy. Drop your ideas in the comments!
π Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.