How to use this model in Amazon Sagemaker ?
I am unable to install this package in sagemaker
pip install -U FlagEmbedding
You should be able to use it with the Hugging Face TEI container.
See here for more details for how to deploy it: https://huggingface.co./blog/sagemaker-huggingface-embedding
If you don't want to use sagemaker, you can also use inference endpoints here.
To make calls to it, do the following:
import requests
API_URL = "ENDPOINT_URL/rerank"
headers = {
"Accept" : "application/json",
"Authorization": "Bearer hf_token",
"Content-Type": "application/json"
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]})
# [{'index': 1, 'score': 0.9976311}, {'index': 0, 'score': 0.12527926}]
You should be able to use it with the Hugging Face TEI container.
See here for more details for how to deploy it: https://huggingface.co./blog/sagemaker-huggingface-embedding
If you don't want to use sagemaker, you can also use inference endpoints here.
To make calls to it, do the following:
import requests API_URL = "ENDPOINT_URL/rerank" headers = { "Accept" : "application/json", "Authorization": "Bearer hf_token", "Content-Type": "application/json" } def query(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() output = query({"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}) # [{'index': 1, 'score': 0.9976311}, {'index': 0, 'score': 0.12527926}]
docker run command
docker run --name bge_rrk_6201 -d -p 6201:80 -v /models:/data ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id /data/bge-reranker-v2-m3
docker logs command
2024-09-14T10:45:19.877443Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "/dat*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "2774d18b0909", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: Some("sk-aaabbbcccdddeeefffggghhhiiijjjkkk"), json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-09-14T10:45:20.605567Z WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
2024-09-14T10:45:20.605594Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 8192
2024-09-14T10:45:20.606233Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 32 tokenization workers
2024-09-14T10:45:33.019924Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
thread '<unnamed>' panicked at backends/ort/src/lib.rs:363:30:
no entry found for key
stack backtrace:
0: 0x557f0f47be4c - <unknown>
1: 0x557f0f147080 - <unknown>
2: 0x557f0f4492a2 - <unknown>
3: 0x557f0f47d9fe - <unknown>
4: 0x557f0f47d170 - <unknown>
5: 0x557f0f47e332 - <unknown>
6: 0x557f0f47dd5c - <unknown>
7: 0x557f0f47dcb6 - <unknown>
8: 0x557f0f47dca1 - <unknown>
9: 0x557f0ed04534 - <unknown>
10: 0x557f0ed04b12 - <unknown>
11: 0x557f0f2a4d6f - <unknown>
12: 0x557f0f4bc820 - <unknown>
13: 0x557f0f482ba9 - <unknown>
14: 0x557f0f481a4d - <unknown>
15: 0x557f0f47efe5 - <unknown>
16: 0x7f6773a5c134 - <unknown>
17: 0x7f6773adba40 - clone
18: 0x0 - <unknown>
Does tei support bge-reranker-v2-m3 or not?
I can use tei to serving bge-m3.