Can this run on FLAN t5?
#5
by
ljhwild
- opened
I'm just reading the paper and it appears long t5 runs on t5 and not on flan t5.
Is there any reason why?
Hello! Both t5 and flan-t5 have the same model architecture. You can see in flan-t5's model card that it is using the t5 architecture under the hood: https://huggingface.co./google/flan-t5-xxl/blob/main/config.json#L3
However, long-t5 has a slightly different architecture to enable it to scale to longer sequences.
Hope that helps!