yhavinga commited on
Commit
d284ee0
1 Parent(s): e974e27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -13,8 +13,8 @@ The tokenizer was trained on a comprehensive dataset, including:
13
  - English and Dutch Wikipedia (278M and 356M, respectively)
14
  - Dutch and English book datasets (211M and 355M, respectively)
15
  - Dutch news articles (256M)
16
- - CodeParrot GitHub code (158M)
17
- - CodeSearchNet diverse code (126M)
18
  - Markdown files with math markup (5.8M)
19
  - Arxiv scientific papers (169M)
20
 
 
13
  - English and Dutch Wikipedia (278M and 356M, respectively)
14
  - Dutch and English book datasets (211M and 355M, respectively)
15
  - Dutch news articles (256M)
16
+ - CodeParrot GitHub Python code (158M)
17
+ - CodeSearchNet Python code (126M)
18
  - Markdown files with math markup (5.8M)
19
  - Arxiv scientific papers (169M)
20