Long context. Running on multiple GPUs
#49
by
averoo
- opened
Hello!
Please, advice me and community on how to run this model in a distributed manner?
I want to put a long context (around 40k tokens) and the model tries to allocate too much memory (around 150 Gb GPU RAM). I have this amount of memory on several cards.