BGE-large-en-v1.5-rag-int8-static

A quantized version of BAAI/BGE-large-en-v1.5 quantized with Intel® Neural Compressor and compatible with Optimum-Intel.

The model can be used with Optimum-Intel API and as a standalone model or as an embedder or ranker module as part of fastRAG RAG pipeline.

Technical details

Quantized using post-training static quantization.

Calibration set qasper (with 100 random samples)"
Quantization tool Optimum-Intel
Backend IPEX
Original model BAAI/BGE-large-en-v1.5

Instructions how to reproduce the quantized model can be found here.

Evaluation - MTEB

Model performance on the Massive Text Embedding Benchmark (MTEB) retrieval and reranking tasks.

INT8 FP32 % diff
Reranking 0.5997 0.6003 -0.108%
Retrieval 0.5346 0.5429 -1.53%

Usage

Using with Optimum-intel

See Optimum-intel installation page for instructions how to install. Or run:

pip install -U optimum[neural-compressor, ipex] intel-extension-for-transformers

Loading a model:

from optimum.intel import IPEXModel

model = IPEXModel.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")

Running inference:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")

inputs = tokenizer(sentences, return_tensors='pt')

with torch.no_grad():
    outputs = model(**inputs)
    # get the vector of [CLS]
    embedded = model_output[0][:, 0]

Using with a fastRAG RAG pipeline

Get started with installing fastRAG as instructed here.

Below is an example for loading the model into a ranker node that embeds and re-ranks all the documents it gets in the node input of a pipeline.

from fastrag.rankers import QuantizedBiEncoderRanker

ranker = QuantizedBiEncoderRanker("Intel/bge-large-en-v1.5-rag-int8-static")

and plugging it into a pipeline


from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="retriever", inputs=["Query"])
p.add_node(component=ranker, name="ranker", inputs=["retriever"])

See a more complete example notebook here.

Downloads last month
60
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Intel/bge-large-en-v1.5-rag-int8-static