Image-Text-to-Text
sentence-transformers
Safetensors
Transformers
qwen2_vl
Qwen2-VL
conversational
marco commited on
Commit
6f270fc
ยท
verified ยท
1 Parent(s): 6cdb035

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -8,8 +8,6 @@ language:
8
  - es
9
  base_model:
10
  - MrLight/dse-qwen2-2b-mrl-v1
11
- datasets:
12
- - llamaindex/vdr-multilingual-train
13
  tags:
14
  - transformers
15
  - Qwen2-VL
@@ -19,17 +17,17 @@ tags:
19
 
20
  ![](cover.png)
21
 
22
- vdr-2b-multi-v1 is a multilingual model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
23
 
24
 
25
  - **Trained on ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ฌ๐Ÿ‡ง English, ๐Ÿ‡ซ๐Ÿ‡ท French and ๐Ÿ‡ฉ๐Ÿ‡ช German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
26
 
27
- - **Low VRAM and Faster Inference**: english model achieves better results on synthetic vidore benchmarks with just 30% of the base model image resolution. This results in 3x faster inference and much lower VRAM usage.
28
-
29
  - **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
30
 
31
  - **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
32
 
 
 
33
  # Usage
34
 
35
  **Initialize model and processor**
@@ -175,6 +173,8 @@ The model is based on [MrLight/dse-qwen2-2b-mrl-v1](https://huggingface.co/MrLig
175
 
176
  # Results
177
 
 
 
178
  The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
179
 
180
  All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
@@ -212,4 +212,4 @@ All evaluations are performed by calculating **NDCG@5** scores using **1536 dime
212
  | | **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai** | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
213
  |--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
214
  | dse-qwen2-2b-mrl-v1 | 83.6 | 79.8 | **95.7** | **96.9** | **92** | 98.2 | 56.3 | **85.2** | **53.9** | **87.5** | 90.3 |
215
- | vdr-2b-multi-v1 | **84.0** | **82.4** | 95.5 | 96.5 | 91.2 | **98.5** | **58.5** | 84.7 | 53.6 | 87.1 | **92.2** |
 
8
  - es
9
  base_model:
10
  - MrLight/dse-qwen2-2b-mrl-v1
 
 
11
  tags:
12
  - transformers
13
  - Qwen2-VL
 
17
 
18
  ![](cover.png)
19
 
20
+ vdr-2b-multi-v1 is a multilingual embedding model designed for visual document retrieval across multiple languages and domains. This model is designed to encode document page screenshots into dense single-vector representations, this will effectively allow to search and query visually rich multilingual documents without the need for any OCR, data extraction pipelines, chunking...
21
 
22
 
23
  - **Trained on ๐Ÿ‡ฎ๐Ÿ‡น Italian, ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ฌ๐Ÿ‡ง English, ๐Ÿ‡ซ๐Ÿ‡ท French and ๐Ÿ‡ฉ๐Ÿ‡ช German:** together they form a new large, open-source, multilingual training dataset of 500k high-quality samples.
24
 
 
 
25
  - **Cross-lingual Retrieval**: substantially better on real-world scenarios. For example, this allows for searching german documents with italian queries.
26
 
27
  - **Matryoshka Representation Learning**: You can reduce the vectors size 3x and still keep 98% of the embeddings quality.
28
 
29
+ To know more about the model, read the [announcement blogpost](https://huggingface.co/blog/marco/vdr-2b-multilingual).
30
+
31
  # Usage
32
 
33
  **Initialize model and processor**
 
173
 
174
  # Results
175
 
176
+ ![](ndcgtop.png)
177
+
178
  The model has been evaluated on the Vidore benchmark and on custom-built evaluation sets that allow testing its multilingual capabilities on text-only, visual-only and mixed page screenshots. The evaluation dataset is publicly available [here on HuggingFace](https://huggingface.co/datasets/llamaindex/vdr-multilingual-test).
179
 
180
  All evaluations are performed by calculating **NDCG@5** scores using **1536 dimensions** vectors and an image resolution that can be represented with **maximum 768 tokens**.
 
212
  | | **Avg** | **shiftproject** | **government** | **healthcare** | **energy** | **ai** | **docvqa** | **arxivqa** | **tatdqa** | **infovqa** | **tabfquad** |
213
  |--------------------:|---------:|-----------------:|---------------:|---------------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-------------:|
214
  | dse-qwen2-2b-mrl-v1 | 83.6 | 79.8 | **95.7** | **96.9** | **92** | 98.2 | 56.3 | **85.2** | **53.9** | **87.5** | 90.3 |
215
+ | vdr-2b-multi-v1 | **84.0** | **82.4** | 95.5 | 96.5 | 91.2 | **98.5** | **58.5** | 84.7 | 53.6 | 87.1 | **92.2** |