Send Requests to Endpoints
You can send requests to Inference Endpoints using the UI leveraging the Inference Widget or programmatically, e.g. with cURL, @huggingface/inference
, huggingface_hub
or any REST client. The Endpoint overview not only provides a interactive widget for you to test the Endpoint, but also generates code for python
, javascript
and curl
. You can use this code to quickly get started with your Endpoint in your favorite programming language.
Below are also examples on how to use the @huggingface/inference
library to call an inference endpoint.
Use the UI to send requests
The Endpoint overview provides access to the Inference Widget which can be used to send requests (see step 6 of Create an Endpoint). This allows you to quickly test your Endpoint with different inputs and share it with team members.
Use cURL to send requests
The cURL command for the request above should look like this. You’ll need to provide your user token which can be found in your Hugging Face account settings:
Example Request:
curl https://uu149rez6gw9ehej.eu-west-1.aws.endpoints.huggingface.cloud/distilbert-sentiment \
-X POST \
-d '{"inputs": "Deploying my first endpoint was an amazing experience."}' \
-H "Authorization: Bearer <Token>"
The Endpoints API offers the same API definitions as the Inference API and the SageMaker Inference Toolkit. All the request payloads are documented in the Supported Tasks section.
This means for an NLP task, the payload is represented as the inputs
key and additional pipeline parameters are included in the parameters
key. You can provide any of the supported kwargs
from pipelines as parameters.
For image or audio tasks, you should send the data as a binary request with the corresponding mime type. Below is an example cURL for an audio payload:
curl --request POST \
--url https://uu149rez6gw9ehej.eu-west-1.aws.endpoints.huggingface.cloud/wav2vec-asr \
--header 'Authorization: Bearer <Token>' \
--header 'Content-Type: audio/x-flac' \
--data-binary '@sample1.flac'
To use your cURL command as code, use the cURL Converter tool to quickly get started with the programming language of your choice.
Use javascript library @huggingface/inference
You can use the javascript library to call an inference endpoint:
const inference = new HfInference('hf_...') // your user token
const gpt2 = inference.endpoint('https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2-endpoint')
const { generated_text } = await gpt2.textGeneration({ inputs: 'The answer to the universe is' })
Custom handler
@huggingface/inference
supports tasks from https://huggingface.co./tasks, and is typed accordingly.
If your model has additional inputs, or even custom inputs / outputs you can use the more generic .request
/ streamingRequest
:
const output = await inference.request({
inputs: "blablabla",
parameters: {
custom_parameter_1: ...,
...
}
});