danielsteinigen
commited on
add sample for usage with vLLM to Readme
Browse files
README.md
CHANGED
@@ -131,6 +131,47 @@ print(prediction_text)
|
|
131 |
|
132 |
This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
|
133 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
## Training Details
|
135 |
|
136 |
### Pre-Training Data
|
|
|
131 |
|
132 |
This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
|
133 |
|
134 |
+
### Usage with vLLM Server
|
135 |
+
Starting the vLLM Server:
|
136 |
+
``` shell
|
137 |
+
vllm serve openGPT-X/Teuken-7B-instruct-research-v0.4 --trust-remote-code
|
138 |
+
```
|
139 |
+
Use Chat API with vLLM and pass the language of the Chat-Template as extra body:
|
140 |
+
``` python
|
141 |
+
from openai import OpenAI
|
142 |
+
|
143 |
+
client = OpenAI(
|
144 |
+
api_key="EMPTY",
|
145 |
+
base_url="http://localhost:8000/v1",
|
146 |
+
)
|
147 |
+
completion = client.chat.completions.create(
|
148 |
+
model="openGPT-X/Teuken-7B-instruct-research-v0.4",
|
149 |
+
messages=[{"role": "User", "content": "Hallo"}],
|
150 |
+
extra_body={"chat_template":"DE"}
|
151 |
+
)
|
152 |
+
print(f"Assistant: {completion]")
|
153 |
+
```
|
154 |
+
The default language of the Chat-Template can also be set when starting the vLLM Server. For this create a new file with the name `lang` and the content `DE` and start the vLLM Server as follows:
|
155 |
+
``` shell
|
156 |
+
vllm serve openGPT-X/Teuken-7B-instruct-research-v0.4 --trust-remote-code --chat-template lang
|
157 |
+
```
|
158 |
+
|
159 |
+
### Usage with vLLM Offline Batched Inference
|
160 |
+
``` python
|
161 |
+
from vllm import LLM, SamplingParams
|
162 |
+
|
163 |
+
sampling_params = SamplingParams(temperature=0.01, max_tokens=1024, stop=["</s>"])
|
164 |
+
llm = LLM(model="openGPT-X/Teuken-7B-instruct-research-v0.4", trust_remote_code=True, dtype="bfloat16")
|
165 |
+
outputs = llm.chat(
|
166 |
+
messages=[{"role": "User", "content": "Hallo"}],
|
167 |
+
sampling_params=sampling_params,
|
168 |
+
chat_template="DE"
|
169 |
+
)
|
170 |
+
print(f"Prompt: {outputs[0].prompt}")
|
171 |
+
print(f"Assistant: {outputs[0].outputs[0].text}")
|
172 |
+
```
|
173 |
+
|
174 |
+
|
175 |
## Training Details
|
176 |
|
177 |
### Pre-Training Data
|