openGPT-X
/

Teuken-7B-instruct-research-v0.4

@@ -131,6 +131,47 @@ print(prediction_text)
 This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
 ## Training Details
 ### Pre-Training Data

 This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
+### Usage with vLLM Server
+Starting the vLLM Server:
+``` shell
+vllm serve openGPT-X/Teuken-7B-instruct-research-v0.4 --trust-remote-code
+```
+Use Chat API with vLLM and pass the language of the Chat-Template as extra body:
+``` python
+from openai import OpenAI
+client = OpenAI(
+    api_key="EMPTY",
+    base_url="http://localhost:8000/v1",
+)
+completion = client.chat.completions.create(
+    model="openGPT-X/Teuken-7B-instruct-research-v0.4",
+    messages=[{"role": "User", "content": "Hallo"}],
+    extra_body={"chat_template":"DE"}
+)
+print(f"Assistant: {completion]")
+```
+The default language of the Chat-Template can also be set when starting the vLLM Server. For this create a new file with the name `lang` and the content `DE` and start the vLLM Server as follows:
+``` shell
+vllm serve openGPT-X/Teuken-7B-instruct-research-v0.4 --trust-remote-code --chat-template lang
+```
+### Usage with vLLM Offline Batched Inference
+``` python
+from vllm import LLM, SamplingParams
+sampling_params = SamplingParams(temperature=0.01, max_tokens=1024, stop=["</s>"])
+llm = LLM(model="openGPT-X/Teuken-7B-instruct-research-v0.4", trust_remote_code=True, dtype="bfloat16")
+outputs = llm.chat(
+    messages=[{"role": "User", "content": "Hallo"}],
+    sampling_params=sampling_params,
+    chat_template="DE"
+)
+print(f"Prompt: {outputs[0].prompt}")
+print(f"Assistant: {outputs[0].outputs[0].text}")
+```
 ## Training Details
 ### Pre-Training Data