Performance oddities of the 3M model.
#3 opened 8 months ago
by
MartialTerran
GPT-2 model having16 4-float attention heads
#2 opened 8 months ago
by
MartialTerran
Adding `safetensors` variant of this model
#1 opened about 1 year ago
by
SFconvertbot