Could someone help me obtain the attention weights that the model gives to the input tokens?
Β· Sign up or log in to comment