Functional Interpolation for Relative Positions Improves Long Context Transformers Paper • 2310.04418 • Published Oct 6, 2023 • 4
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs Paper • 2106.09997 • Published Jun 18, 2021 • 2
Neural Machine Translation of Rare Words with Subword Units Paper • 1508.07909 • Published Aug 31, 2015 • 4
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models Paper • 2403.14438 • Published Mar 21 • 2
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 45
RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper • 1907.11692 • Published Jul 26, 2019 • 7
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation Paper • 2103.06874 • Published Mar 11, 2021 • 1
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 8 days ago • 103
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 13 days ago • 75