llamaindex
/

vdr-2b-multi-v1

@@ -8,8 +8,6 @@ language:
 - es
 base_model:
 - MrLight/dse-qwen2-2b-mrl-v1
-datasets:
-- llamaindex/vdr-multilingual-train
 tags:
 - transformers
 - Qwen2-VL
@@ -19,17 +17,17 @@ tags:
 ![](cover.png)
-vdr-2b-multi-v1 is a multilingual model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
 - **Trained on 🇮🇹 Italian, 🇪🇸 Spanish, 🇬🇧 English, 🇫🇷 French and 🇩🇪 German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
-- **Low VRAM and Faster Inference**: english model achieves better results on synthetic vidore benchmarks with just 30% of the base model image resolution. This results in 3x faster inference and much lower VRAM usage.
 - **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
 - **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
 # Usage
 **Initialize model and processor**
@@ -175,6 +173,8 @@ The model is based on [MrLight/dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLig
 # Results
 The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
 All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
@@ -212,4 +212,4 @@ All evaluations are performed by calculating **NDCG@5** scores using **1536 dime
 |                     |  **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai**     | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
 |--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
 | dse-qwen2-2b-mrl-v1 |     83.6 |             79.8 |       **95.7** |       **96.9** |     **92** |   98.2     |       56.3 |    **85.2** |   **53.9** |    **87.5** |         90.3 |
-|     vdr-2b-multi-v1 | **84.0** |         **82.4** |           95.5 |           96.5 |       91.2 |   **98.5** |   **58.5** |        84.7 |       53.6 |        87.1 |     **92.2** |

 - es
 base_model:
 - MrLight/dse-qwen2-2b-mrl-v1
 tags:
 - transformers
 - Qwen2-VL
 ![](cover.png)
+vdr-2b-multi-v1 is a multilingual embedding model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
 - **Trained on 🇮🇹 Italian, 🇪🇸 Spanish, 🇬🇧 English, 🇫🇷 French and 🇩🇪 German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
 - **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
 - **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
+To know more about the model, read the [announcement blogpost](https://huggingface.co/blog/marco/vdr-2b-multilingual).
 # Usage
 **Initialize model and processor**
 # Results
+![](ndcgtop.png)
 The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
 All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
 |                     |  **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai**     | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
 |--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
 | dse-qwen2-2b-mrl-v1 |     83.6 |             79.8 |       **95.7** |       **96.9** |     **92** |   98.2     |       56.3 |    **85.2** |   **53.9** |    **87.5** |         90.3 |
+|     vdr-2b-multi-v1 | **84.0** |         **82.4** |           95.5 |           96.5 |       91.2 |   **98.5** |   **58.5** |        84.7 |       53.6 |        87.1 |     **92.2** |