Update config.json for flan-t5-small
I believe the num_heads and num_layers values are swapped for google/flan-t5-small. See the comparison for t5-small (link below) which flan-t5-small is based off. With the current values, the hidden size of the model isn't divisible by the number of attention heads (512 % 6 = 2).
t5-small implementation is not aligned with original paper, "Small. We consider a smaller model, which scales the baseline down by using dmodel = 512, dff = 2,048, 8-headed attention, and only 6 layers each in the encoder and decoder"
actual config is
network.T5Config:
emb_dim = 512
num_heads = 6
num_encoder_layers = 8
num_decoder_layers = 8
head_dim = 64
mlp_dim = 1024
https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t5_1_1/small.gin
Any clues on why the config is changed?