After training LlamaGuard-7b inference is slower
#23
by
jamesoneill12
- opened
Hi there,
I am having trouble meeting the same latency after training the model in bfloat16 (and other dtypes I assume).
Does LlamaGuard-7b have any tricks to make inference faster that could potentially lost after uploading a fine-tuned version of it to the hub ?
Thanks,
James