
sbintuitions/modernbert-ja-130m
Fill-Mask
•
Updated
•
9.7k
•
37
On entailment adjacent tasks (which btw, great work on the zero-shot NLI models @MoritzLaurer !), I'd expect DeBERTa to be slightly better than ModernBERT -- it seems its pretraining objective is better aligned with it. In our evals, we consistently had DeBERTa come on top on MNLI (there's a full GLUE table in the appendix of the paper), it's only on aggregated GLUE that we saw ModernBERT-Base beat DeBERTaV3-Base.