Update config.json for flan-t5-small

#13
by petermca - opened

I believe the num_heads and num_layers values are swapped for google/flan-t5-small. See the comparison for t5-small (link below) which flan-t5-small is based off. With the current values, the hidden size of the model isn't divisible by the number of attention heads (512 % 6 = 2).

https://huggingface.co./t5-small/blob/df1b051c49625cf57a3d0d8d3863ed4d13564fe4/config.json#L16

t5-small implementation is not aligned with original paper, "Small. We consider a smaller model, which scales the baseline down by using dmodel = 512, dff = 2,048, 8-headed attention, and only 6 layers each in the encoder and decoder"

actual config is
network.T5Config:
emb_dim = 512
num_heads = 6
num_encoder_layers = 8
num_decoder_layers = 8
head_dim = 64
mlp_dim = 1024

https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin

Any clues on why the config is changed?

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment