Check dataset validity
Before you download a dataset from the Hub, it is helpful to know if a specific dataset you’re interested in is available. The dataset viewer provides the /is-valid
endpoint to check if a specific dataset works without any errors.
The API endpoint will return an error for datasets that cannot be loaded with the 🤗 Datasets library, for example, because the data hasn’t been uploaded or the format is not supported.
preview
field in the
response of /is-valid
to check if a dataset is partially
supported.This guide shows you how to check dataset validity programmatically, but free to try it out with Postman, RapidAPI, or ReDoc.
Check if a dataset is valid
/is-valid
checks whether a specific dataset loads without any error. This endpoint’s query parameter requires you to specify the name of the dataset:
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/is-valid?dataset=cornell-movie-review-data/rotten_tomatoes"
def query():
response = requests.get(API_URL, headers=headers)
return response.json()
data = query()
The response looks like this if a dataset is valid:
{
"viewer": true,
"preview": true,
"search": true,
"filter": true,
"statistics": true,
}
The response looks like this if a dataset is valid but /search is not available for it:
{
"viewer": true,
"preview": true,
"search": false,
"filter": true,
"statistics": true,
}
The response looks like this if a dataset is valid but /filter is not available for it:
{
"viewer": true,
"preview": true,
"search": true,
"filter": false,
"statistics": true,
}
Similarly, if the statistics are not available:
{
"viewer": true,
"preview": true,
"search": true,
"filter": true,
"statistics": false,
}
If only the first rows of a dataset are available, then the response looks like:
{
"viewer": false,
"preview": true,
"search": true,
"filter": true,
"statistics": true,
}
Finally, if the dataset is not valid at all, then the response is:
{
"viewer": false,
"preview": false,
"search": false,
"filter": false,
"statistics": false,
}
Some cases where a dataset is not valid are:
- the dataset viewer is disabled
- the dataset is gated but the access is not granted: no token is passed or the passed token is not authorized
- the dataset is private but the owner is not a PRO user or an Enterprise Hub org
- the dataset contains no data or the data format is not supported