sebastian-hofstaetter
commited on
Commit
•
f094fd0
1
Parent(s):
c9bcb62
update model card
Browse files
README.md
CHANGED
@@ -1,12 +1,57 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
4 |
|
5 |
-
|
|
|
|
|
6 |
|
7 |
If you want to know more about our simple, yet effective knowledge distillation method for efficient information retrieval models for a variety of student architectures that is used for this model instance check out our paper: https://arxiv.org/abs/2010.02666 🎉
|
8 |
|
9 |
-
For more information and a minimal usage example
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
If you use our model checkpoint please cite our work as:
|
12 |
|
|
|
1 |
+
---
|
2 |
+
language: "en"
|
3 |
+
tags:
|
4 |
+
- dpr
|
5 |
+
- dense-passage-retrieval
|
6 |
+
- knowledge-distillation
|
7 |
+
datasets:
|
8 |
+
- ms_marco
|
9 |
+
---
|
10 |
|
11 |
+
# Margin-MSE Trained DistilBert for Dense Passage Retrieval
|
12 |
|
13 |
+
We provide a retrieval trained DistilBert-based model (we call the architecture BERT_Dot). Our model is trained with Margin-MSE using a 3 teacher BERT_Cat (concatenated BERT scoring) ensemble on MSMARCO-Passage.
|
14 |
+
|
15 |
+
This instance can be used to **re-rank a candidate set** or **directly for a vector index based dense retrieval**. The architecture is a 6-layer DistilBERT, without architecture additions or modifications (we only change the weights during training) - to receive a query/passage representation we pool the CLS vector. We use the same BERT layers for both query and passage encoding (yields better results, and lowers memory requirements).
|
16 |
|
17 |
If you want to know more about our simple, yet effective knowledge distillation method for efficient information retrieval models for a variety of student architectures that is used for this model instance check out our paper: https://arxiv.org/abs/2010.02666 🎉
|
18 |
|
19 |
+
For more information, training data, source code, and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/neural-ranking-kd
|
20 |
+
|
21 |
+
## Effectiveness on MSMARCO Passage & TREC-DL'19
|
22 |
+
|
23 |
+
We trained our model on the MSMARCO standard ("small"-400K query) training triples with knowledge distillation with a batch size of 32 on a single consumer-grade GPU (11GB memory).
|
24 |
+
|
25 |
+
For re-ranking we used the top-1000 BM25 results.
|
26 |
+
|
27 |
+
### MSMARCO-DEV
|
28 |
+
|
29 |
+
| | MRR@10 | NDCG@10 | Recall@1K |
|
30 |
+
|----------------------------------|--------|---------|-----------------------------|
|
31 |
+
| BM25 | .194 | .241 | .868 |
|
32 |
+
| **Margin-MSE BERT_Dot** (Re-ranking) | .332 | .391 | .868 (from BM25 candidates) |
|
33 |
+
| **Margin-MSE BERT_Dot** (Retrieval) | .323 | .381 | .957 |
|
34 |
+
|
35 |
+
### TREC-DL'19
|
36 |
+
|
37 |
+
For MRR and Recall we use the recommended binarization point of the graded relevance of 2. This might skew the results when compared to other binarization point numbers.
|
38 |
+
|
39 |
+
| | MRR@10 | NDCG@10 | Recall@1K |
|
40 |
+
|----------------------------------|--------|---------|-----------------------------|
|
41 |
+
| BM25 | .689 | .501 | .739 |
|
42 |
+
| **Margin-MSE BERT_Dot** (Re-ranking) | .862 | .712 | .739 (from BM25 candidates) |
|
43 |
+
| **Margin-MSE BERT_Dot** (Retrieval) | .868 | .697 | .769 |
|
44 |
+
|
45 |
+
For more baselines, info and analysis, please see the paper: https://arxiv.org/abs/2010.02666
|
46 |
+
|
47 |
+
## Limitations & Bias
|
48 |
+
|
49 |
+
- The model inherits social biases from both DistilBERT and MSMARCO.
|
50 |
+
|
51 |
+
- The model is only trained on relatively short passages of MSMARCO (avg. 60 words length), so it might struggle with longer text.
|
52 |
+
|
53 |
+
|
54 |
+
## Citation
|
55 |
|
56 |
If you use our model checkpoint please cite our work as:
|
57 |
|