Cohere embed-english-v3.0
This repository contains the tokenizer for the Cohere embed-english-v3.0
model. See our blogpost Cohere Embed V3 for more details on this model.
You can use the embedding model either via the Cohere API, AWS SageMaker or in your private deployments.
Usage Cohere API
The following code snippet shows the usage of the Cohere API. Install the cohere SDK via:
pip install -U cohere
Get your free API key on: www.cohere.com
# This snippet shows and example how to use the Cohere Embed V3 models for semantic search.
# Make sure to have the Cohere SDK in at least v4.30 install: pip install -U cohere
# Get your API key from: www.cohere.com
import cohere
import numpy as np
cohere_key = "{YOUR_COHERE_API_KEY}" #Get your API key from www.cohere.com
co = cohere.Client(cohere_key)
docs = ["The capital of France is Paris",
"PyTorch is a machine learning framework based on the Torch library.",
"The average cat lifespan is between 13-17 years"]
#Encode your documents with input type 'search_document'
doc_emb = co.embed(docs, input_type="search_document", model="embed-english-v3.0").embeddings
doc_emb = np.asarray(doc_emb)
#Encode your query with input type 'search_query'
query = "What is Pytorch"
query_emb = co.embed([query], input_type="search_query", model="embed-english-v3.0").embeddings
query_emb = np.asarray(query_emb)
query_emb.shape
#Compute the dot product between query embedding and document embedding
scores = np.dot(query_emb, doc_emb.T)[0]
#Find the highest scores
max_idx = np.argsort(-scores)
print(f"Query: {query}")
for idx in max_idx:
print(f"Score: {scores[idx]:.2f}")
print(docs[idx])
print("--------")
Usage AWS SageMaker
The embedding model can be privately deployed in your AWS Cloud using our AWS SageMaker marketplace offering. It runs privately in your VPC, with latencies as low as 5ms for query encoding.
Usage AWS Bedrock
Soon the model will also be available via AWS Bedrock. Stay tuned
Private Deployment
You want to run the model on your own hardware? Contact Sales to learn more.
Supported Languages
This model was trained on nearly 1B English training pairs.
Evaluation results can be found in the Embed V3.0 Benchmark Results spreadsheet.
- Downloads last month
- 250
Spaces using Cohere/Cohere-embed-english-v3.0 5
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported81.299
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported46.182
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported75.477
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported95.618
- ap on MTEB AmazonPolarityClassificationtest set self-reported93.225
- f1 on MTEB AmazonPolarityClassificationtest set self-reported95.616
- accuracy on MTEB AmazonReviewsClassification (en)test set self-reported51.720
- f1 on MTEB AmazonReviewsClassification (en)test set self-reported50.529
- ndcg_at_10 on MTEB ArguAnatest set self-reported61.521
- v_measure on MTEB ArxivClusteringP2Ptest set self-reported49.173