Inference Endpoints (dedicated) documentation

Send Requests to Endpoints

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Send Requests to Endpoints

You can send requests to Inference Endpoints using the UI leveraging the Inference Widget or programmatically, e.g. with cURL, @huggingface/inference, huggingface_hub or any REST client. The Endpoint overview not only provides a interactive widget for you to test the Endpoint, but also generates code for python, javascript and curl. You can use this code to quickly get started with your Endpoint in your favorite programming language.

Below are also examples on how to use the @huggingface/inference library to call an inference endpoint.

Use the UI to send requests

The Endpoint overview provides access to the Inference Widget which can be used to send requests (see step 6 of Create an Endpoint). This allows you to quickly test your Endpoint with different inputs and share it with team members.

Use cURL to send requests

The cURL command for the request above should look like this. You’ll need to provide your user token which can be found in your Hugging Face account settings:

Example Request:

curl https://uu149rez6gw9ehej.eu-west-1.aws.endpoints.huggingface.cloud/distilbert-sentiment \
	-X POST \
	-d '{"inputs": "Deploying my first endpoint was an amazing experience."}' \
	-H "Authorization: Bearer <Token>"

The Endpoints API offers the same API definitions as the Inference API and the SageMaker Inference Toolkit. All the request payloads are documented in the Supported Tasks section.

This means for an NLP task, the payload is represented as the inputs key and additional pipeline parameters are included in the parameters key. You can provide any of the supported kwargs from pipelines as parameters. For image or audio tasks, you should send the data as a binary request with the corresponding mime type. Below is an example cURL for an audio payload:

curl --request POST \
  --url https://uu149rez6gw9ehej.eu-west-1.aws.endpoints.huggingface.cloud/wav2vec-asr \
  --header 'Authorization: Bearer <Token>' \
  --header 'Content-Type: audio/x-flac' \
  --data-binary '@sample1.flac'

To use your cURL command as code, use the cURL Converter tool to quickly get started with the programming language of your choice.

Use javascript library @huggingface/inference

You can use the javascript library to call an inference endpoint:

const inference = new HfInference('hf_...') // your user token

const gpt2 = inference.endpoint('https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2-endpoint')
const { generated_text } = await gpt2.textGeneration({ inputs: 'The answer to the universe is' })

Custom handler

@huggingface/inference supports tasks from https://huggingface.co./tasks, and is typed accordingly.

If your model has additional inputs, or even custom inputs / outputs you can use the more generic .request / streamingRequest:

const output = await inference.request({
  inputs: "blablabla",
  parameters: {
    custom_parameter_1: ...,
    ...
  }
});
< > Update on GitHub