Florent Daudens

fdaudens

AI & ML interests

AI & Journalism

Articles

Organizations

Posts 62

view post
Post
182
Wikimedia Enterprise just dropped full English & French Wikipedia on Hugging Face as structured JSON 🀯

Key points:
1. Parsed articles ready for machine learning pipelines
2. Perfect for AI model development - from pre-training to RAG
3. Includes metadata, Wikidata links, and content scores
4. Licensed under GFDL and CC BY-SA 4.0 (some content may have additional terms)

I've been testing it, and it's a game-changer. The structured format is like a supercharged version of raw Wiki dumps.

Thoughts on potential applications? I'm particularly interested in how this could improve AI language models' factual accuracy. Drop your ideas in the comments!

Dataset: wikimedia/structured-wikipedia

#AI #OpenData #Wikipedia #MachineLearning
view post
Post
681
πŸš€ Your AI toolkit just got a major upgrade! I updated the Journalists on Hugging Face community's collection with tools for investigative work, content creation, and data analysis.

Sharing these new additions with the links in case it’s helpful:
- @wendys-llc 's excellent 6-part video series on AI for investigative journalism https://www.youtube.com/playlist?list=PLewNEVDy7gq1_GPUaL0OQ31QsiHP5ncAQ
- @jeremycaplan 's curated AI Spaces on HF https://wondertools.substack.com/p/huggingface
- @Xenova 's Whisper Timestamped (with diarization!) for private, on-device transcription Xenova/whisper-speaker-diarization & Xenova/whisper-word-level-timestamps
- Flux models for image gen & LoRAs autotrain-projects/train-flux-lora-ease
- FineGrain's object cutter finegrain/finegrain-object-cutter and object eraser (this one's cool) finegrain/finegrain-object-eraser
- FineVideo: massive open-source annotated dataset + explorer HuggingFaceFV/FineVideo-Explorer
- Qwen2 chat demos, including 2.5 & multimodal versions (crushing it on handwriting recognition) Qwen/Qwen2.5 & Qwen/Qwen2-VL
- GOT-OCR integration stepfun-ai/GOT_official_online_demo
- HTML to Markdown converter maxiw/HTML-to-Markdown
- Text-to-SQL query tool by @davidberenstein1957 for HF datasets davidberenstein1957/text-to-sql-hub-datasets

There's a lot of potential here for journalism and beyond. Give these a try and let me know what you build!

You can also add your favorite ones if you're part of the community!

Check it out: https://huggingface.co./JournalistsonHF

#AIforJournalism #HuggingFace #OpenSourceAI

models

None public yet