Precisions about the config properties wrt the paper
In https://huggingface.co./answerdotai/ModernBERT-base/blob/main/config.json , we see "hidden_activation": "gelu" and "position_embedding_type": "absolute" (even though rope related settings do appear in the config as well), whereas the paper says that GeGLU and RoPE are used respectively. Is it expected and a strangeness coming from the transformers library itself or is it a misconfig/export ? Thanks
As we mention in the paper, GeGLU is GLU with GeLU instead of sigmoid. "hidden_activation": "gelu"
is correct.
We adopt GeGLU (Shazeer, 2020), a Gated-Linear Units (GLU)-based (Dauphin et al., 2017) activation function built on top of the original BERT’s GeLU.
I believe position_embedding_type
is a default config argument in transformers. ModernBERT doesn't use it, I'll have to check if we can remove it from the config.