Free Plug & Play Machine Learning API
Easily integrate NLP, audio and computer vision models deployed for inference via simple API calls. Harness the power of machine learning while staying out of MLOps!
data:image/s3,"s3://crabby-images/8aa30/8aa30c41dd92798ca3a784e20add6447310b6b26" alt="Arrow down"
Serve Machine Learning Models Without the Hassle
Acceleration and scalability built-in
Natural Language Processing Tasks
Text generation, text classification, token classification, zero-shot classification, feature extraction, NER, translation, summarization, conversational, question answering, table question answering, text2text generation and fill mask.
data:image/s3,"s3://crabby-images/3af68/3af6874a5b83e4ae31797517d2ceefa3fe388b77" alt="Natural Language Processing Tasks"
Audio Tasks
Automatic speech recognition (ASR) and audio classification.
data:image/s3,"s3://crabby-images/d537a/d537a86390b57f904890d1e3150c4ab909a6bade" alt="Audio Tasks"
Computer Vision Tasks
Object detection and image segmentation.
data:image/s3,"s3://crabby-images/c39a9/c39a964ffae284a982daff6186c742b97727de33" alt="Computer Vision Tasks"
How Does It Work?
State of the Art as easy as HTTP requests
import requests
def query(payload, model_id, api_token):
headers = {"Authorization": f"Bearer {api_token}"}
API_URL = f"https://api-inference.huggingface.co/models/{model_id}"
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
model_id = "distilbert-base-uncased"
api_token = "hf_XXXXXXXX" # get yours at hf.co/settings/tokens
data = query("The goal of life is [MASK].", model_id, api_token)
Fully-hosted API for AI
Up and running in minutes
+50,000 state-of-the-art models
- Instantly integrate ML models, deployed for inference via simple API calls.
Wide variety of machine learning tasks
- We support a broad range of NLP, audio, and vision tasks, including sentiment analysis, text generation, speech recognition, object detection and more!
Production ready
- We have built the most robust, secure and efficient AI infrastructure to handle production level loads with unmatched performance and reliability.
Real-time inferences
- We optimize and accelerate our models to serve predictions up to 10x faster, with the latency required for real-time applications.
Scalability
- The PRO Plan offers higher request rate limits to experiment with models. Need more? Use dedicated Inference Endpoints for guaranteed resources and autoscaling.
SLAs
- Production level support and 24/7 SLAs are available through our enterprise plans.
Why Inference API (serverless)?
data:image/s3,"s3://crabby-images/9f9e1/9f9e1520056bfe93bf5c6a08dfb06011d670584d" alt="Implement and iterate in no time"
Implement and iterate in no time
Leverage the largest and most diverse library of models for NLP, audio and computer vision to easily build machine learning powered applications in minutes.
data:image/s3,"s3://crabby-images/25512/255128f19b55344385d083ef1353cdb308f051b9" alt="Stay on the cutting edge of AI"
Stay on the cutting edge of AI
Seamlessly upgrade to a new model so you're always up to date with the state of the art.
data:image/s3,"s3://crabby-images/a8677/a8677c152885b633920e701f1abc947791a25da5" alt="Focus on building"
Focus on building
Stop worrying about infrastructure. We take care of models' performance and reliability at scale. Run models in milliseconds with just a few lines of code.
data:image/s3,"s3://crabby-images/a9324/a932407da8b74712c954521cb3333190dc8169e2" alt="Let us do the machine learning"
Let us do the machine learning
Harness the power of AI while staying out of data science and MLOps. Inference API (serverless) democratize machine learning to all engineering teams.
Pricing
Use shared infrastructure for free, or switch to dedicated Inference Endpoints for production
🧪 PRO Plan
🏢 Enterprise
- Get free inference to explore models
-
Higher rate limits for Inference API (serverless)
Experiment with text, audio, vision models without hitting limits
-
Support
Email support and no SLAs
-
Infrastructure
Shared resources, no auto-scaling, standard latency
- Get dedicated resources for production inference
-
Enterprise support for Inference Endpoints
Custom pricing based on volume commit
Starts at $2k/mo, annual contracts
-
Support
Production level support, 24/7 SLAs and uptime guarantees
-
Infrastructure
Auto-scaling, dedicated resources to achieve desired latency, and support large models
Frequently Asked Questions
- What’s the latency?
- We accelerate our models on CPU and GPU so your apps work faster. Read up on how we achieved 100x speedup on Transformers.
- Is my data secure?
- All data transfers are encrypted in transit with SSL. Hugging Face protects your inference data - no third-party access. Enterprise plans offer additional layers of security for log-less requests.
- What is your uptime?
- Check out our status page to learn more about our uptime and follow status updates on any identified performance issues.
- Do you offer SLAs?
- For the free tier, there is no service-level agreement (SLA) on support response times. However, enterprise plans include an SLA on support response times and uptime guarantees.
- Does it support large models?
- Large models (>10gb) require dedicated infrastructure and maintenance to work reliably, we can support this via an enterprise plan with yearly commitment.
- What’s your support email?
- For customer support and general inquiries about Inference Endpoints, please contact us at [email protected].