Model inference speed ....
#2
by
halsayed
- opened
@halsayed Thanks for using Jais. You may get better inference speed using 2 x A100 80GB GPUs as the model size is ~(30x4)GB and all layers of the model could fit on 2 GPUs.
@samta-kamboj thanks, increasing GPU solved the problem. Was there any attempt to quantize the model and reduce the vram footprint?