Post
We've been busy cooking up some interesting models at
@jinaai
, with a recent highlight being the release of our first batch of bilingual embedding models.
Internally labeled as
You can find these models available on Huggingface:
1. German-English bilingual embedding: jinaai/jina-embeddings-v2-base-de
2. Chinese-English bilingual embedding: jinaai/jina-embeddings-v2-base-zh
We're also excited to announce that a Spanish bilingual embedding will be released in approximately two weeks.
Our evaluation across various MLM tasks has demonstrated that the Bilingual Backbone consistently outperforms state-of-the-art Multilingual Backbones like XLM-Roberta (given its focus on just two languages).
Despite being three times smaller than the leading multilingual models (e5-multilingual-large), our released bilingual embedding models have shown superior performance compared to e5-multilingual-large, excelling in both monolingual and cross-lingual search tasks.
Currently, we're putting the finishing touches on the technical report, which should be available on Arxiv by next week.
Looking ahead, the embedding team is gearing up for
with some initial groundwork already underway. Stay tuned for more updates!
Internally labeled as
X+EN
, where X represents the target language and EN
stays fixed, these models specialize in both monolingual tasks and cross-lingual retrieval tasks, crossing from X to EN.You can find these models available on Huggingface:
1. German-English bilingual embedding: jinaai/jina-embeddings-v2-base-de
2. Chinese-English bilingual embedding: jinaai/jina-embeddings-v2-base-zh
We're also excited to announce that a Spanish bilingual embedding will be released in approximately two weeks.
Our evaluation across various MLM tasks has demonstrated that the Bilingual Backbone consistently outperforms state-of-the-art Multilingual Backbones like XLM-Roberta (given its focus on just two languages).
Despite being three times smaller than the leading multilingual models (e5-multilingual-large), our released bilingual embedding models have shown superior performance compared to e5-multilingual-large, excelling in both monolingual and cross-lingual search tasks.
Currently, we're putting the finishing touches on the technical report, which should be available on Arxiv by next week.
Looking ahead, the embedding team is gearing up for
jina-embeddings-v3
with some initial groundwork already underway. Stay tuned for more updates!