What Happens If the Prompt Exceeds 8,196 Tokens? And difference between input limit and context length limit?

#36
by averyyu99 - opened

Dear community members, I found that the maximum token limit for a prompt is 8,196 tokens. What happens if I provide a prompt longer than this limit? Will the prompt be automatically truncated, with only the first 8,196 tokens being processed? I tested this and didn't encounter any errors, so I'm wondering how the model handles prompts that exceed the limit.

Also, I'm curious about the difference between the input limit and the context length limit. Since LLaMA 3 has a context length of 128k tokens, does that mean we can use iterative prompting strategies to process longer texts effectively? If so, how does the model handle prompts that exceed the input limit within a single request?

Any help or explanation is appreciated! Thanks : )

Please note that the context window length is the same as the input prompt length, and for this model, the context window is 130K, as defined here. As such, the maximum token limit for a prompt is 130K, instead of 8196.

However, if you are using very long prompt input, it will consume more GPU memory.

Best regards,

Shuyue
Dec. 18th, 2024

Sign up or log in to comment