Aya Datasets
The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions.
Viewer • Updated • 514M • 25.9k • 220Note The Aya Collection is a massive multilingual collection consisting of 513 million instances of prompts and completions covering a wide range of tasks. This collection incorporates instruction-style templates from fluent speakers and applies them to a curated list of datasets, as well as translations of instruction-style datasets into 101 languages.
CohereForAI/aya_dataset
Viewer • Updated • 206k • 2.34k • 299Note The Aya Dataset is a multilingual instruction fine-tuning dataset curated by an open-science community via Aya Annotation Platform from Cohere For AI. The dataset contains a total of 204k human-annotated prompt-completion pairs along with the demographics data of the annotators.
CohereForAI/aya_evaluation_suite
Viewer • Updated • 26.8k • 2.05k • 49Note Aya Evaluation Suite contains open-ended conversation-style prompts to evaluate multilingual open-ended generation quality. To strike a balance between language coverage and the quality that comes with human curation, we create an evaluation suite that covers 101 languages for evaluating conversational abilities of language models
CohereForAI/aya_collection_language_split
Viewer • Updated • 514M • 33.1k • 95Note This is same as original aya_collection, only varies in structure of upload. While the original aya_collection is structured by folders split according to dataset name, this dataset is split by language. We recommend you use this version of the dataset if you are only interested in downloading all of the Aya collection for a single or smaller set of languages.
CohereForAI/aya_redteaming
Viewer • Updated • 7.42k • 427 • 21Note The Aya Red-teaming dataset is a human-annotated multilingual red-teaming dataset consisting of harmful prompts in 8 languages across 9 different categories of harm with explicit labels for "global" and "local" harm.