Catherine Arnett

catherinearnett

AI & ML interests

multilingual NLP, tokenization

Recent Activity

Organizations

Blog-explorers's profile picture Language and Cognition Lab (UCSD)'s profile picture PleIAs's profile picture

Articles 4

Article
82

They Said It Couldn’t Be Done

Article
100

Releasing the largest multilingual open pretraining dataset

datasets

None public yet