Update README.md
Browse files
README.md
CHANGED
@@ -8,8 +8,6 @@ language:
|
|
8 |
- es
|
9 |
base_model:
|
10 |
- MrLight/dse-qwen2-2b-mrl-v1
|
11 |
-
datasets:
|
12 |
-
- llamaindex/vdr-multilingual-train
|
13 |
tags:
|
14 |
- transformers
|
15 |
- Qwen2-VL
|
@@ -19,17 +17,17 @@ tags:
|
|
19 |
|
20 |
![](cover.png)
|
21 |
|
22 |
-
vdr-2b-multi-v1 is a multilingual model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
|
23 |
|
24 |
|
25 |
- **Trained on ๐ฎ๐น Italian, ๐ช๐ธ Spanish, ๐ฌ๐ง English, ๐ซ๐ท French and ๐ฉ๐ช German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
|
26 |
|
27 |
-
- **Low VRAM and Faster Inference**: english model achieves better results on synthetic vidore benchmarks with just 30% of the base model image resolution. This results in 3x faster inference and much lower VRAM usage.
|
28 |
-
|
29 |
- **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
|
30 |
|
31 |
- **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
|
32 |
|
|
|
|
|
33 |
# Usage
|
34 |
|
35 |
**Initialize model and processor**
|
@@ -175,6 +173,8 @@ The model is based on [MrLight/dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLig
|
|
175 |
|
176 |
# Results
|
177 |
|
|
|
|
|
178 |
The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
|
179 |
|
180 |
All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
|
@@ -212,4 +212,4 @@ All evaluations are performed by calculating **NDCG@5** scores using **1536 dime
|
|
212 |
| | **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai** | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
|
213 |
|--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
|
214 |
| dse-qwen2-2b-mrl-v1 | 83.6 | 79.8 | **95.7** | **96.9** | **92** | 98.2 | 56.3 | **85.2** | **53.9** | **87.5** | 90.3 |
|
215 |
-
| vdr-2b-multi-v1 | **84.0** | **82.4** | 95.5 | 96.5 | 91.2 | **98.5** | **58.5** | 84.7 | 53.6 | 87.1 | **92.2** |
|
|
|
8 |
- es
|
9 |
base_model:
|
10 |
- MrLight/dse-qwen2-2b-mrl-v1
|
|
|
|
|
11 |
tags:
|
12 |
- transformers
|
13 |
- Qwen2-VL
|
|
|
17 |
|
18 |
![](cover.png)
|
19 |
|
20 |
+
vdr-2b-multi-v1 is a multilingual embedding model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
|
21 |
|
22 |
|
23 |
- **Trained on ๐ฎ๐น Italian, ๐ช๐ธ Spanish, ๐ฌ๐ง English, ๐ซ๐ท French and ๐ฉ๐ช German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
|
24 |
|
|
|
|
|
25 |
- **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
|
26 |
|
27 |
- **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
|
28 |
|
29 |
+
To know more about the model, read the [announcement blogpost](https://huggingface.co/blog/marco/vdr-2b-multilingual).
|
30 |
+
|
31 |
# Usage
|
32 |
|
33 |
**Initialize model and processor**
|
|
|
173 |
|
174 |
# Results
|
175 |
|
176 |
+
![](ndcgtop.png)
|
177 |
+
|
178 |
The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
|
179 |
|
180 |
All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
|
|
|
212 |
| | **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai** | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
|
213 |
|--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
|
214 |
| dse-qwen2-2b-mrl-v1 | 83.6 | 79.8 | **95.7** | **96.9** | **92** | 98.2 | 56.3 | **85.2** | **53.9** | **87.5** | 90.3 |
|
215 |
+
| vdr-2b-multi-v1 | **84.0** | **82.4** | 95.5 | 96.5 | 91.2 | **98.5** | **58.5** | 84.7 | 53.6 | 87.1 | **92.2** |
|