Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT Paper • 2402.07440 • Published Feb 12 • 1
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection Paper • 2406.00856 • Published Jun 2 • 11
NeMo Curator - Classifier Models Collection Classifier models that can be used in NeMo Curator for labelling/filtering datasets. • 9 items • Updated 12 days ago • 9
MMTEB Collection Our contribution to the Massive Multilingual Text Embedding Benchmark (MMTEB). Retrieval and reranking benchmarks in 16 languages. • 4 items • Updated Jun 6 • 1
Arctic-embed Collection A collection of text embedding models optimized for retrieval accuracy and efficiency • 8 items • Updated 20 days ago • 17
ColPali Models Collection Pre-trained checkpoints for the ColPali model. • 8 items • Updated 17 days ago • 2
Small LMs Text Embedding Collection Contrastive fine-tuned version of Language Models up to 2B parameters using LoRA • 3 items • Updated May 8 • 4
Matryoshka Embedding Models Collection https://huggingface.co./blog/matryoshka • 14 items • Updated Jun 4 • 15
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 56
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 19 items • Updated 5 days ago • 17
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 129
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published Jun 21 • 62