Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
nyuuzyouΒ 
posted an update 6 days ago
Post
496
🌐 Public MediaWiki Collection Dataset - nyuuzyou/wikis

Collection of 1.66M+ articles from 930 public MediaWiki instances featuring:

- Full article content from diverse public wikis across the internet
- Complete metadata including templates, categories, and section structure
- Rich structural information preserving wiki organization and links
- Multilingual content across 35+ languages including English, Chinese, Spanish, and more
- Regional language variants including US/UK English, Brazilian Portuguese, and Traditional/Simplified Chinese

Key contents:
- 1,662,448 wiki articles with full text
- Extensive metadata including templates, categories, sections
- Internal wikilinks and external reference information
- Cross-domain knowledge spanning multiple topics and fields
In this post