HuggingFaceH4/zephyr-7b-alpha · Not working in Text Generation Web UI

Oct 17, 2023

I tried all model loaders with this model but it failed to load. Any ideas how to get it to load? Thanks.

Oct 17, 2023

i run it no problem using the blokes quantized version in gguf file format using llama.cpp loader. I use Q6_k quantized file. i get about 10 tokens/s

schadha

Oct 17, 2023

Try on this Colab: https://colab.research.google.com/drive/18XH8DTbgI4Zrsg-Xat-El3FvL8ZIDXMD

Change
llm_chain = LLMChain(prompt=prompt,
llm=HuggingFaceHub(repo_id="google/flan-t5-xl",
model_kwargs={"temperature":0,
"max_length":64}))

question = " what is capital of France?"
print(llm_chain.run(question))

to

llm_chain = LLMChain(prompt=prompt,
llm=HuggingFaceHub(repo_id="HuggingFaceH4/zephyr-7b-alpha",
model_kwargs={"temperature":0.7, # NOTE
"max_length":64}))

question = " what is capital of France?"
print(llm_chain.run(question))

Answers were OK not compared to this online chat https://huggingface.co./spaces/HuggingFaceH4/zephyr-chat