TheBloke
/

Mistral-7B-Instruct-v0.1-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Sep 29, 2023

Commit

2e0aebe

•

1 Parent(s): a68dbfd

Update README.md

Files changed (1) hide show

README.md +1 -41

README.md CHANGED Viewed

@@ -49,9 +49,7 @@ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization metho
 These are experimental first AWQs for the brand-new model format, Mistral.
-As of September 29th 2023, they are supported by AutoAWQ, and vLLM (version 0.2).
-To use from AutoAWQ requires installing both AutoAWQ and Transformers from Github. More details are below.
 <!-- description end -->
 <!-- repositories-available start -->
@@ -86,44 +84,6 @@ Models are released as sharded safetensors files.
 <!-- README_AWQ.md-provided-files end -->
-<!-- README_AWQ.md-use-from-vllm start -->
-## Serving this model from vLLM
-Make sure you are using vLLM version 0.2.
-Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
-- When using vLLM as a server, pass the `--quantization awq` parameter, for example:
-```shell
-python3 python -m vllm.entrypoints.api_server --model TheBloke/Mistral-7B-Instruct-v0.1-AWQ --quantization awq --dtype float16
-```
-When using vLLM from Python code, pass the `quantization=awq` parameter, for example:
-```python
-from vllm import LLM, SamplingParams
-prompts = [
-    "Hello, my name is",
-    "The president of the United States is",
-    "The capital of France is",
-    "The future of AI is",
-]
-sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
-llm = LLM(model="TheBloke/Mistral-7B-Instruct-v0.1-AWQ", quantization="awq", dtype="float16")
-outputs = llm.generate(prompts, sampling_params)
-# Print the outputs.
-for output in outputs:
-    prompt = output.prompt
-    generated_text = output.outputs[0].text
-    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
-```
-<!-- README_AWQ.md-use-from-vllm start -->
 <!-- README_AWQ.md-use-from-python start -->
 ## How to use this AWQ model from Python code

 These are experimental first AWQs for the brand-new model format, Mistral.
+As of September 29th 2023, they are only supported by AutoAWQ (version 0.1.1+)
 <!-- description end -->
 <!-- repositories-available start -->
 <!-- README_AWQ.md-provided-files end -->
 <!-- README_AWQ.md-use-from-python start -->
 ## How to use this AWQ model from Python code