nonhuman commited on
Commit
4ec8dba
β€’
1 Parent(s): e99760a

Upload 225 files

Browse files
This view is limited to 50 files because it contains too many changes. Β  See raw diff
Files changed (50) hide show
  1. .gitattributes +2 -0
  2. docs/my-website/.gitignore +20 -0
  3. docs/my-website/Dockerfile +9 -0
  4. docs/my-website/README.md +41 -0
  5. docs/my-website/babel.config.js +3 -0
  6. docs/my-website/blog/2021-08-26-welcome/index.md +43 -0
  7. docs/my-website/docs/budget_manager.md +248 -0
  8. docs/my-website/docs/caching/caching_api.md +75 -0
  9. docs/my-website/docs/caching/local_caching.md +92 -0
  10. docs/my-website/docs/caching/redis_cache.md +73 -0
  11. docs/my-website/docs/completion/batching.md +182 -0
  12. docs/my-website/docs/completion/config.md +49 -0
  13. docs/my-website/docs/completion/function_call.md +545 -0
  14. docs/my-website/docs/completion/input.md +582 -0
  15. docs/my-website/docs/completion/message_trimming.md +36 -0
  16. docs/my-website/docs/completion/mock_requests.md +72 -0
  17. docs/my-website/docs/completion/model_alias.md +53 -0
  18. docs/my-website/docs/completion/multiple_deployments.md +53 -0
  19. docs/my-website/docs/completion/output.md +68 -0
  20. docs/my-website/docs/completion/prompt_formatting.md +86 -0
  21. docs/my-website/docs/completion/reliable_completions.md +196 -0
  22. docs/my-website/docs/completion/stream.md +76 -0
  23. docs/my-website/docs/completion/token_usage.md +154 -0
  24. docs/my-website/docs/contact.md +6 -0
  25. docs/my-website/docs/debugging/hosted_debugging.md +91 -0
  26. docs/my-website/docs/debugging/local_debugging.md +64 -0
  27. docs/my-website/docs/default_code_snippet.md +22 -0
  28. docs/my-website/docs/embedding/async_embedding.md +15 -0
  29. docs/my-website/docs/embedding/moderation.md +10 -0
  30. docs/my-website/docs/embedding/supported_embedding.md +201 -0
  31. docs/my-website/docs/exception_mapping.md +102 -0
  32. docs/my-website/docs/extras/contributing.md +49 -0
  33. docs/my-website/docs/getting_started.md +100 -0
  34. docs/my-website/docs/index.md +402 -0
  35. docs/my-website/docs/langchain/langchain.md +135 -0
  36. docs/my-website/docs/migration.md +35 -0
  37. docs/my-website/docs/observability/callbacks.md +35 -0
  38. docs/my-website/docs/observability/custom_callback.md +358 -0
  39. docs/my-website/docs/observability/helicone_integration.md +55 -0
  40. docs/my-website/docs/observability/langfuse_integration.md +105 -0
  41. docs/my-website/docs/observability/langsmith_integration.md +77 -0
  42. docs/my-website/docs/observability/llmonitor_integration.md +65 -0
  43. docs/my-website/docs/observability/promptlayer_integration.md +77 -0
  44. docs/my-website/docs/observability/sentry.md +44 -0
  45. docs/my-website/docs/observability/slack_integration.md +93 -0
  46. docs/my-website/docs/observability/supabase_integration.md +101 -0
  47. docs/my-website/docs/observability/telemetry.md +13 -0
  48. docs/my-website/docs/observability/traceloop_integration.md +34 -0
  49. docs/my-website/docs/observability/wandb_integration.md +51 -0
  50. docs/my-website/docs/projects.md +19 -0
.gitattributes ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ docs/my-website/img/alt_dashboard.png filter=lfs diff=lfs merge=lfs -text
2
+ docs/my-website/img/dashboard_log.png filter=lfs diff=lfs merge=lfs -text
docs/my-website/.gitignore ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dependencies
2
+ /node_modules
3
+
4
+ # Production
5
+ /build
6
+
7
+ # Generated files
8
+ .docusaurus
9
+ .cache-loader
10
+
11
+ # Misc
12
+ .DS_Store
13
+ .env.local
14
+ .env.development.local
15
+ .env.test.local
16
+ .env.production.local
17
+
18
+ npm-debug.log*
19
+ yarn-debug.log*
20
+ yarn-error.log*
docs/my-website/Dockerfile ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10
2
+
3
+ COPY . /app
4
+ WORKDIR /app
5
+ RUN pip install -r requirements.txt
6
+
7
+ EXPOSE $PORT
8
+
9
+ CMD litellm --host 0.0.0.0 --port $PORT --workers 10 --config config.yaml
docs/my-website/README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Website
2
+
3
+ This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
4
+
5
+ ### Installation
6
+
7
+ ```
8
+ $ yarn
9
+ ```
10
+
11
+ ### Local Development
12
+
13
+ ```
14
+ $ yarn start
15
+ ```
16
+
17
+ This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.
18
+
19
+ ### Build
20
+
21
+ ```
22
+ $ yarn build
23
+ ```
24
+
25
+ This command generates static content into the `build` directory and can be served using any static contents hosting service.
26
+
27
+ ### Deployment
28
+
29
+ Using SSH:
30
+
31
+ ```
32
+ $ USE_SSH=true yarn deploy
33
+ ```
34
+
35
+ Not using SSH:
36
+
37
+ ```
38
+ $ GIT_USER=<Your GitHub username> yarn deploy
39
+ ```
40
+
41
+ If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the `gh-pages` branch.
docs/my-website/babel.config.js ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ module.exports = {
2
+ presets: [require.resolve('@docusaurus/core/lib/babel/preset')],
3
+ };
docs/my-website/blog/2021-08-26-welcome/index.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸš… litellm
2
+ A light 100 line package to simplify calling OpenAI, Azure, Cohere, Anthropic APIs
3
+
4
+ ###### litellm manages:
5
+ * Calling all LLM APIs using the OpenAI format - `completion(model, messages)`
6
+ * Consistent output for all LLM APIs, text responses will always be available at `['choices'][0]['message']['content']`
7
+ * Consistent Exceptions for all LLM APIs, we map RateLimit, Context Window, and Authentication Error exceptions across all providers to their OpenAI equivalents. [see Code](https://github.com/BerriAI/litellm/blob/ba1079ff6698ef238c5c7f771dd2b698ec76f8d9/litellm/utils.py#L250)
8
+
9
+ ###### observability:
10
+ * Logging - see exactly what the raw model request/response is by plugging in your own function `completion(.., logger_fn=your_logging_fn)` and/or print statements from the package `litellm.set_verbose=True`
11
+ * Callbacks - automatically send your data to Helicone, Sentry, Posthog, Slack - `litellm.success_callbacks`, `litellm.failure_callbacks` [see Callbacks](https://litellm.readthedocs.io/en/latest/advanced/)
12
+
13
+ ## Quick Start
14
+ Go directly to code: [Getting Started Notebook](https://colab.research.google.com/drive/1gR3pY-JzDZahzpVdbGBtrNGDBmzUNJaJ?usp=sharing)
15
+ ### Installation
16
+ ```
17
+ pip install litellm
18
+ ```
19
+
20
+ ### Usage
21
+ ```python
22
+ from litellm import completion
23
+
24
+ ## set ENV variables
25
+ os.environ["OPENAI_API_KEY"] = "openai key"
26
+ os.environ["COHERE_API_KEY"] = "cohere key"
27
+
28
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
29
+
30
+ # openai call
31
+ response = completion(model="gpt-3.5-turbo", messages=messages)
32
+
33
+ # cohere call
34
+ response = completion("command-nightly", messages)
35
+ ```
36
+ Need Help / Support : [see troubleshooting](https://litellm.readthedocs.io/en/latest/troubleshoot)
37
+
38
+ ## Why did we build liteLLM
39
+ - **Need for simplicity**: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI, Cohere
40
+
41
+ ## Support
42
+ * [Meet with us πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
43
docs/my-website/docs/budget_manager.md ADDED
@@ -0,0 +1,248 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Tabs from '@theme/Tabs';
2
+ import TabItem from '@theme/TabItem';
3
+
4
+ # Budget Manager
5
+
6
+ Don't want to get crazy bills because either while you're calling LLM APIs **or** while your users are calling them? use this.
7
+
8
+ LiteLLM exposes:
9
+ * `litellm.max_budget`: a global variable you can use to set the max budget (in USD) across all your litellm calls. If this budget is exceeded, it will raise a BudgetExceededError
10
+ * `BudgetManager`: A class to help set budgets per user. BudgetManager creates a dictionary to manage the user budgets, where the key is user and the object is their current cost + model-specific costs.
11
+
12
+ ## quick start
13
+
14
+ ```python
15
+ import litellm, os
16
+ from litellm import completion
17
+
18
+ # set env variable
19
+ os.environ["OPENAI_API_KEY"] = "your-api-key"
20
+
21
+ litellm.max_budget = 0.001 # sets a max budget of $0.001
22
+
23
+ messages = [{"role": "user", "content": "Hey, how's it going"}]
24
+ completion(model="gpt-4", messages=messages)
25
+ print(litellm._current_cost)
26
+ completion(model="gpt-4", messages=messages)
27
+ ```
28
+
29
+ ## User-based rate limiting
30
+ <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/LiteLLM_User_Based_Rate_Limits.ipynb">
31
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
32
+ </a>
33
+
34
+ ```python
35
+ from litellm import BudgetManager, completion
36
+
37
+ budget_manager = BudgetManager(project_name="test_project")
38
+
39
+ user = "1234"
40
+
41
+ # create a budget if new user user
42
+ if not budget_manager.is_valid_user(user):
43
+ budget_manager.create_budget(total_budget=10, user=user)
44
+
45
+ # check if a given call can be made
46
+ if budget_manager.get_current_cost(user=user) <= budget_manager.get_total_budget(user):
47
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey, how's it going?"}])
48
+ budget_manager.update_cost(completion_obj=response, user=user)
49
+ else:
50
+ response = "Sorry - no budget!"
51
+ ```
52
+
53
+ [**Implementation Code**](https://github.com/BerriAI/litellm/blob/main/litellm/budget_manager.py)
54
+
55
+ ## use with Text Input / Output
56
+
57
+ Update cost by just passing in the text input / output and model name.
58
+
59
+ ```python
60
+ from litellm import BudgetManager
61
+
62
+ budget_manager = BudgetManager(project_name="test_project")
63
+ user = "12345"
64
+ budget_manager.create_budget(total_budget=10, user=user, duration="daily")
65
+
66
+ input_text = "hello world"
67
+ output_text = "it's a sunny day in san francisco"
68
+ model = "gpt-3.5-turbo"
69
+
70
+ budget_manager.update_cost(user=user, model=model, input_text=input_text, output_text=output_text) # πŸ‘ˆ
71
+ print(budget_manager.get_current_cost(user))
72
+ ```
73
+
74
+ ## advanced usage
75
+ In production, we will need to
76
+ * store user budgets in a database
77
+ * reset user budgets based on a set duration
78
+
79
+
80
+
81
+ ### LiteLLM API
82
+
83
+ The LiteLLM API provides both. It stores the user object in a hosted db, and runs a cron job daily to reset user-budgets based on the set duration (e.g. reset budget daily/weekly/monthly/etc.).
84
+
85
+ **Usage**
86
+ ```python
87
+ budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted")
88
+ ```
89
+
90
+ **Complete Code**
91
+ ```python
92
+ from litellm import BudgetManager, completion
93
+
94
+ budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted")
95
+
96
+ user = "1234"
97
+
98
+ # create a budget if new user user
99
+ if not budget_manager.is_valid_user(user):
100
+ budget_manager.create_budget(total_budget=10, user=user, duration="monthly") # πŸ‘ˆ duration = 'daily'/'weekly'/'monthly'/'yearly'
101
+
102
+ # check if a given call can be made
103
+ if budget_manager.get_current_cost(user=user) <= budget_manager.get_total_budget(user):
104
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey, how's it going?"}])
105
+ budget_manager.update_cost(completion_obj=response, user=user)
106
+ else:
107
+ response = "Sorry - no budget!"
108
+ ```
109
+
110
+ ### Self-hosted
111
+
112
+ To use your own db, set the BudgetManager client type to `hosted` **and** set the api_base.
113
+
114
+ Your api is expected to expose `/get_budget` and `/set_budget` endpoints. [See code for details](https://github.com/BerriAI/litellm/blob/27f1051792176a7eb1fe3b72b72bccd6378d24e9/litellm/budget_manager.py#L7)
115
+
116
+ **Usage**
117
+ ```python
118
+ budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted", api_base="your_custom_api")
119
+ ```
120
+ **Complete Code**
121
+ ```python
122
+ from litellm import BudgetManager, completion
123
+
124
+ budget_manager = BudgetManager(project_name="<my-unique-project>", client_type="hosted", api_base="your_custom_api")
125
+
126
+ user = "1234"
127
+
128
+ # create a budget if new user user
129
+ if not budget_manager.is_valid_user(user):
130
+ budget_manager.create_budget(total_budget=10, user=user, duration="monthly") # πŸ‘ˆ duration = 'daily'/'weekly'/'monthly'/'yearly'
131
+
132
+ # check if a given call can be made
133
+ if budget_manager.get_current_cost(user=user) <= budget_manager.get_total_budget(user):
134
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey, how's it going?"}])
135
+ budget_manager.update_cost(completion_obj=response, user=user)
136
+ else:
137
+ response = "Sorry - no budget!"
138
+ ```
139
+
140
+ ## Budget Manager Class
141
+ The `BudgetManager` class is used to manage budgets for different users. It provides various functions to create, update, and retrieve budget information.
142
+
143
+ Below is a list of public functions exposed by the Budget Manager class and their input/outputs.
144
+
145
+ ### __init__
146
+ ```python
147
+ def __init__(self, project_name: str, client_type: str = "local", api_base: Optional[str] = None)
148
+ ```
149
+ - `project_name` (str): The name of the project.
150
+ - `client_type` (str): The client type ("local" or "hosted"). Defaults to "local".
151
+ - `api_base` (Optional[str]): The base URL of the API. Defaults to None.
152
+
153
+
154
+ ### create_budget
155
+ ```python
156
+ def create_budget(self, total_budget: float, user: str, duration: Literal["daily", "weekly", "monthly", "yearly"], created_at: float = time.time())
157
+ ```
158
+ Creates a budget for a user.
159
+
160
+ - `total_budget` (float): The total budget of the user.
161
+ - `user` (str): The user id.
162
+ - `duration` (Literal["daily", "weekly", "monthly", "yearly"]): The budget duration.
163
+ - `created_at` (float): The creation time. Default is the current time.
164
+
165
+ ### projected_cost
166
+ ```python
167
+ def projected_cost(self, model: str, messages: list, user: str)
168
+ ```
169
+ Computes the projected cost for a session.
170
+
171
+ - `model` (str): The name of the model.
172
+ - `messages` (list): The list of messages.
173
+ - `user` (str): The user id.
174
+
175
+ ### get_total_budget
176
+ ```python
177
+ def get_total_budget(self, user: str)
178
+ ```
179
+ Returns the total budget of a user.
180
+
181
+ - `user` (str): user id.
182
+
183
+ ### update_cost
184
+ ```python
185
+ def update_cost(self, completion_obj: ModelResponse, user: str)
186
+ ```
187
+ Updates the user's cost.
188
+
189
+ - `completion_obj` (ModelResponse): The completion object received from the model.
190
+ - `user` (str): The user id.
191
+
192
+ ### get_current_cost
193
+ ```python
194
+ def get_current_cost(self, user: str)
195
+ ```
196
+ Returns the current cost of a user.
197
+
198
+ - `user` (str): The user id.
199
+
200
+ ### get_model_cost
201
+ ```python
202
+ def get_model_cost(self, user: str)
203
+ ```
204
+ Returns the model cost of a user.
205
+
206
+ - `user` (str): The user id.
207
+
208
+ ### is_valid_user
209
+ ```python
210
+ def is_valid_user(self, user: str) -> bool
211
+ ```
212
+ Checks if a user is valid.
213
+
214
+ - `user` (str): The user id.
215
+
216
+ ### get_users
217
+ ```python
218
+ def get_users(self)
219
+ ```
220
+ Returns a list of all users.
221
+
222
+ ### reset_cost
223
+ ```python
224
+ def reset_cost(self, user: str)
225
+ ```
226
+ Resets the cost of a user.
227
+
228
+ - `user` (str): The user id.
229
+
230
+ ### reset_on_duration
231
+ ```python
232
+ def reset_on_duration(self, user: str)
233
+ ```
234
+ Resets the cost of a user based on the duration.
235
+
236
+ - `user` (str): The user id.
237
+
238
+ ### update_budget_all_users
239
+ ```python
240
+ def update_budget_all_users(self)
241
+ ```
242
+ Updates the budget for all users.
243
+
244
+ ### save_data
245
+ ```python
246
+ def save_data(self)
247
+ ```
248
+ Stores the user dictionary.
docs/my-website/docs/caching/caching_api.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hosted Cache - api.litellm.ai
2
+
3
+ Use api.litellm.ai for caching `completion()` and `embedding()` responses
4
+
5
+ ## Quick Start Usage - Completion
6
+ ```python
7
+ import litellm
8
+ from litellm import completion
9
+ from litellm.caching import Cache
10
+ litellm.cache = Cache(type="hosted") # init cache to use api.litellm.ai
11
+
12
+ # Make completion calls
13
+ response1 = completion(
14
+ model="gpt-3.5-turbo",
15
+ messages=[{"role": "user", "content": "Tell me a joke."}]
16
+ caching=True
17
+ )
18
+
19
+ response2 = completion(
20
+ model="gpt-3.5-turbo",
21
+ messages=[{"role": "user", "content": "Tell me a joke."}],
22
+ caching=True
23
+ )
24
+ # response1 == response2, response 1 is cached
25
+ ```
26
+
27
+
28
+ ## Usage - Embedding()
29
+
30
+ ```python
31
+ import time
32
+ import litellm
33
+ from litellm import completion, embedding
34
+ from litellm.caching import Cache
35
+ litellm.cache = Cache(type="hosted")
36
+
37
+ start_time = time.time()
38
+ embedding1 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
39
+ end_time = time.time()
40
+ print(f"Embedding 1 response time: {end_time - start_time} seconds")
41
+
42
+ start_time = time.time()
43
+ embedding2 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
44
+ end_time = time.time()
45
+ print(f"Embedding 2 response time: {end_time - start_time} seconds")
46
+ ```
47
+
48
+ ## Caching with Streaming
49
+ LiteLLM can cache your streamed responses for you
50
+
51
+ ### Usage
52
+ ```python
53
+ import litellm
54
+ from litellm import completion
55
+ from litellm.caching import Cache
56
+ litellm.cache = Cache(type="hosted")
57
+
58
+ # Make completion calls
59
+ response1 = completion(
60
+ model="gpt-3.5-turbo",
61
+ messages=[{"role": "user", "content": "Tell me a joke."}],
62
+ stream=True,
63
+ caching=True)
64
+ for chunk in response1:
65
+ print(chunk)
66
+
67
+
68
+ response2 = completion(
69
+ model="gpt-3.5-turbo",
70
+ messages=[{"role": "user", "content": "Tell me a joke."}],
71
+ stream=True,
72
+ caching=True)
73
+ for chunk in response2:
74
+ print(chunk)
75
+ ```
docs/my-website/docs/caching/local_caching.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LiteLLM - Local Caching
2
+
3
+ ## Caching `completion()` and `embedding()` calls when switched on
4
+
5
+ liteLLM implements exact match caching and supports the following Caching:
6
+ * In-Memory Caching [Default]
7
+ * Redis Caching Local
8
+ * Redis Caching Hosted
9
+
10
+ ## Quick Start Usage - Completion
11
+ Caching - cache
12
+ Keys in the cache are `model`, the following example will lead to a cache hit
13
+ ```python
14
+ import litellm
15
+ from litellm import completion
16
+ from litellm.caching import Cache
17
+ litellm.cache = Cache()
18
+
19
+ # Make completion calls
20
+ response1 = completion(
21
+ model="gpt-3.5-turbo",
22
+ messages=[{"role": "user", "content": "Tell me a joke."}]
23
+ caching=True
24
+ )
25
+ response2 = completion(
26
+ model="gpt-3.5-turbo",
27
+ messages=[{"role": "user", "content": "Tell me a joke."}],
28
+ caching=True
29
+ )
30
+
31
+ # response1 == response2, response 1 is cached
32
+ ```
33
+
34
+ ## Custom Key-Value Pairs
35
+ Add custom key-value pairs to your cache.
36
+
37
+ ```python
38
+ from litellm.caching import Cache
39
+ cache = Cache()
40
+
41
+ cache.add_cache(cache_key="test-key", result="1234")
42
+
43
+ cache.get_cache(cache_key="test-key)
44
+ ```
45
+
46
+ ## Caching with Streaming
47
+ LiteLLM can cache your streamed responses for you
48
+
49
+ ### Usage
50
+ ```python
51
+ import litellm
52
+ from litellm import completion
53
+ from litellm.caching import Cache
54
+ litellm.cache = Cache()
55
+
56
+ # Make completion calls
57
+ response1 = completion(
58
+ model="gpt-3.5-turbo",
59
+ messages=[{"role": "user", "content": "Tell me a joke."}],
60
+ stream=True,
61
+ caching=True)
62
+ for chunk in response1:
63
+ print(chunk)
64
+ response2 = completion(
65
+ model="gpt-3.5-turbo",
66
+ messages=[{"role": "user", "content": "Tell me a joke."}],
67
+ stream=True,
68
+ caching=True)
69
+ for chunk in response2:
70
+ print(chunk)
71
+ ```
72
+
73
+ ## Usage - Embedding()
74
+ 1. Caching - cache
75
+ Keys in the cache are `model`, the following example will lead to a cache hit
76
+ ```python
77
+ import time
78
+ import litellm
79
+ from litellm import embedding
80
+ from litellm.caching import Cache
81
+ litellm.cache = Cache()
82
+
83
+ start_time = time.time()
84
+ embedding1 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
85
+ end_time = time.time()
86
+ print(f"Embedding 1 response time: {end_time - start_time} seconds")
87
+
88
+ start_time = time.time()
89
+ embedding2 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
90
+ end_time = time.time()
91
+ print(f"Embedding 2 response time: {end_time - start_time} seconds")
92
+ ```
docs/my-website/docs/caching/redis_cache.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Redis Cache
2
+ ### Pre-requisites
3
+ Install redis
4
+ ```
5
+ pip install redis
6
+ ```
7
+ For the hosted version you can setup your own Redis DB here: https://app.redislabs.com/
8
+ ### Usage
9
+ ```python
10
+ import litellm
11
+ from litellm import completion
12
+ from litellm.caching import Cache
13
+ litellm.cache = Cache(type="redis", host=<host>, port=<port>, password=<password>)
14
+
15
+ # Make completion calls
16
+ response1 = completion(
17
+ model="gpt-3.5-turbo",
18
+ messages=[{"role": "user", "content": "Tell me a joke."}],
19
+ caching=True
20
+ )
21
+ response2 = completion(
22
+ model="gpt-3.5-turbo",
23
+ messages=[{"role": "user", "content": "Tell me a joke."}],
24
+ caching=True
25
+ )
26
+
27
+ # response1 == response2, response 1 is cached
28
+ ```
29
+
30
+ ### Custom Cache Keys:
31
+
32
+ Define function to return cache key
33
+ ```python
34
+ # this function takes in *args, **kwargs and returns the key you want to use for caching
35
+ def custom_get_cache_key(*args, **kwargs):
36
+ # return key to use for your cache:
37
+ key = kwargs.get("model", "") + str(kwargs.get("messages", "")) + str(kwargs.get("temperature", "")) + str(kwargs.get("logit_bias", ""))
38
+ print("key for cache", key)
39
+ return key
40
+
41
+ ```
42
+
43
+ Set your function as litellm.cache.get_cache_key
44
+ ```python
45
+ from litellm.caching import Cache
46
+
47
+ cache = Cache(type="redis", host=os.environ['REDIS_HOST'], port=os.environ['REDIS_PORT'], password=os.environ['REDIS_PASSWORD'])
48
+
49
+ cache.get_cache_key = custom_get_cache_key # set get_cache_key function for your cache
50
+
51
+ litellm.cache = cache # set litellm.cache to your cache
52
+
53
+ ```
54
+
55
+ ### Detecting Cached Responses
56
+ For resposes that were returned as cache hit, the response includes a param `cache` = True
57
+
58
+ Example response with cache hit
59
+ ```python
60
+ {
61
+ 'cache': True,
62
+ 'id': 'chatcmpl-7wggdzd6OXhgE2YhcLJHJNZsEWzZ2',
63
+ 'created': 1694221467,
64
+ 'model': 'gpt-3.5-turbo-0613',
65
+ 'choices': [
66
+ {
67
+ 'index': 0, 'message': {'role': 'assistant', 'content': 'I\'m sorry, but I couldn\'t find any information about "litellm" or how many stars it has. It is possible that you may be referring to a specific product, service, or platform that I am not familiar with. Can you please provide more context or clarify your question?'
68
+ }, 'finish_reason': 'stop'}
69
+ ],
70
+ 'usage': {'prompt_tokens': 17, 'completion_tokens': 59, 'total_tokens': 76},
71
+ }
72
+
73
+ ```
docs/my-website/docs/completion/batching.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Batching Completion()
2
+ LiteLLM allows you to:
3
+ * Send many completion calls to 1 model
4
+ * Send 1 completion call to many models: Return Fastest Response
5
+ * Send 1 completion call to many models: Return All Responses
6
+
7
+ ## Send multiple completion calls to 1 model
8
+
9
+ In the batch_completion method, you provide a list of `messages` where each sub-list of messages is passed to `litellm.completion()`, allowing you to process multiple prompts efficiently in a single API call.
10
+
11
+ <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/LiteLLM_batch_completion.ipynb">
12
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
13
+ </a>
14
+
15
+ ### Example Code
16
+ ```python
17
+ import litellm
18
+ import os
19
+ from litellm import batch_completion
20
+
21
+ os.environ['ANTHROPIC_API_KEY'] = ""
22
+
23
+
24
+ responses = batch_completion(
25
+ model="claude-2",
26
+ messages = [
27
+ [
28
+ {
29
+ "role": "user",
30
+ "content": "good morning? "
31
+ }
32
+ ],
33
+ [
34
+ {
35
+ "role": "user",
36
+ "content": "what's the time? "
37
+ }
38
+ ]
39
+ ]
40
+ )
41
+ ```
42
+
43
+ ## Send 1 completion call to many models: Return Fastest Response
44
+ This makes parallel calls to the specified `models` and returns the first response
45
+
46
+ Use this to reduce latency
47
+
48
+ ### Example Code
49
+ ```python
50
+ import litellm
51
+ import os
52
+ from litellm import batch_completion_models
53
+
54
+ os.environ['ANTHROPIC_API_KEY'] = ""
55
+ os.environ['OPENAI_API_KEY'] = ""
56
+ os.environ['COHERE_API_KEY'] = ""
57
+
58
+ response = batch_completion_models(
59
+ models=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"],
60
+ messages=[{"role": "user", "content": "Hey, how's it going"}]
61
+ )
62
+ print(result)
63
+ ```
64
+
65
+ ### Output
66
+ Returns the first response
67
+ ```json
68
+ {
69
+ "object": "chat.completion",
70
+ "choices": [
71
+ {
72
+ "finish_reason": "stop",
73
+ "index": 0,
74
+ "message": {
75
+ "content": " I'm doing well, thanks for asking! I'm an AI assistant created by Anthropic to be helpful, harmless, and honest.",
76
+ "role": "assistant",
77
+ "logprobs": null
78
+ }
79
+ }
80
+ ],
81
+ "id": "chatcmpl-23273eed-e351-41be-a492-bafcf5cf3274",
82
+ "created": 1695154628.2076092,
83
+ "model": "command-nightly",
84
+ "usage": {
85
+ "prompt_tokens": 6,
86
+ "completion_tokens": 14,
87
+ "total_tokens": 20
88
+ }
89
+ }
90
+ ```
91
+
92
+ ## Send 1 completion call to many models: Return All Responses
93
+ This makes parallel calls to the specified models and returns all responses
94
+
95
+ Use this to process requests concurrently and get responses from multiple models.
96
+
97
+ ### Example Code
98
+ ```python
99
+ import litellm
100
+ import os
101
+ from litellm import batch_completion_models_all_responses
102
+
103
+ os.environ['ANTHROPIC_API_KEY'] = ""
104
+ os.environ['OPENAI_API_KEY'] = ""
105
+ os.environ['COHERE_API_KEY'] = ""
106
+
107
+ responses = batch_completion_models_all_responses(
108
+ models=["gpt-3.5-turbo", "claude-instant-1.2", "command-nightly"],
109
+ messages=[{"role": "user", "content": "Hey, how's it going"}]
110
+ )
111
+ print(responses)
112
+
113
+ ```
114
+
115
+ ### Output
116
+
117
+ ```json
118
+ [<ModelResponse chat.completion id=chatcmpl-e673ec8e-4e8f-4c9e-bf26-bf9fa7ee52b9 at 0x103a62160> JSON: {
119
+ "object": "chat.completion",
120
+ "choices": [
121
+ {
122
+ "finish_reason": "stop_sequence",
123
+ "index": 0,
124
+ "message": {
125
+ "content": " It's going well, thank you for asking! How about you?",
126
+ "role": "assistant",
127
+ "logprobs": null
128
+ }
129
+ }
130
+ ],
131
+ "id": "chatcmpl-e673ec8e-4e8f-4c9e-bf26-bf9fa7ee52b9",
132
+ "created": 1695222060.917964,
133
+ "model": "claude-instant-1.2",
134
+ "usage": {
135
+ "prompt_tokens": 14,
136
+ "completion_tokens": 9,
137
+ "total_tokens": 23
138
+ }
139
+ }, <ModelResponse chat.completion id=chatcmpl-ab6c5bd3-b5d9-4711-9697-e28d9fb8a53c at 0x103a62b60> JSON: {
140
+ "object": "chat.completion",
141
+ "choices": [
142
+ {
143
+ "finish_reason": "stop",
144
+ "index": 0,
145
+ "message": {
146
+ "content": " It's going well, thank you for asking! How about you?",
147
+ "role": "assistant",
148
+ "logprobs": null
149
+ }
150
+ }
151
+ ],
152
+ "id": "chatcmpl-ab6c5bd3-b5d9-4711-9697-e28d9fb8a53c",
153
+ "created": 1695222061.0445492,
154
+ "model": "command-nightly",
155
+ "usage": {
156
+ "prompt_tokens": 6,
157
+ "completion_tokens": 14,
158
+ "total_tokens": 20
159
+ }
160
+ }, <OpenAIObject chat.completion id=chatcmpl-80szFnKHzCxObW0RqCMw1hWW1Icrq at 0x102dd6430> JSON: {
161
+ "id": "chatcmpl-80szFnKHzCxObW0RqCMw1hWW1Icrq",
162
+ "object": "chat.completion",
163
+ "created": 1695222061,
164
+ "model": "gpt-3.5-turbo-0613",
165
+ "choices": [
166
+ {
167
+ "index": 0,
168
+ "message": {
169
+ "role": "assistant",
170
+ "content": "Hello! I'm an AI language model, so I don't have feelings, but I'm here to assist you with any questions or tasks you might have. How can I help you today?"
171
+ },
172
+ "finish_reason": "stop"
173
+ }
174
+ ],
175
+ "usage": {
176
+ "prompt_tokens": 13,
177
+ "completion_tokens": 39,
178
+ "total_tokens": 52
179
+ }
180
+ }]
181
+
182
+ ```
docs/my-website/docs/completion/config.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Config
2
+
3
+ Model-specific changes can make our code complicated, making it harder to debug errors. Use model configs to simplify this.
4
+
5
+ ### usage
6
+
7
+ Handling prompt logic. Different models have different context windows. Use `adapt_to_prompt_size` to select the right model for the prompt (in case the current model is too small).
8
+
9
+
10
+ ```python
11
+ from litellm import completion_with_config
12
+ import os
13
+
14
+ config = {
15
+ "available_models": ["gpt-3.5-turbo", "claude-instant-1", "gpt-3.5-turbo-16k"],
16
+ "adapt_to_prompt_size": True, # πŸ‘ˆ key change
17
+ }
18
+
19
+ # set env var
20
+ os.environ["OPENAI_API_KEY"] = "your-api-key"
21
+ os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
22
+
23
+
24
+ sample_text = "how does a court case get to the Supreme Court?" * 1000
25
+ messages = [{"content": sample_text, "role": "user"}]
26
+ response = completion_with_config(model="gpt-3.5-turbo", messages=messages, config=config)
27
+ ```
28
+
29
+ [**See Code**](https://github.com/BerriAI/litellm/blob/30724d9e51cdc2c3e0eb063271b4f171bc01b382/litellm/utils.py#L2783)
30
+
31
+ ### Complete Config Structure
32
+
33
+ ```python
34
+ config = {
35
+ "default_fallback_models": # [Optional] List of model names to try if a call fails
36
+ "available_models": # [Optional] List of all possible models you could call
37
+ "adapt_to_prompt_size": # [Optional] True/False - if you want to select model based on prompt size (will pick from available_models)
38
+ "model": {
39
+ "model-name": {
40
+ "needs_moderation": # [Optional] True/False - if you want to call openai moderations endpoint before making completion call. Will raise exception, if flagged.
41
+ "error_handling": {
42
+ "error-type": { # One of the errors listed here - https://docs.litellm.ai/docs/exception_mapping#custom-mapping-list
43
+ "fallback_model": "" # str, name of the model it should try instead, when that error occurs
44
+ }
45
+ }
46
+ }
47
+ }
48
+ }
49
+ ```
docs/my-website/docs/completion/function_call.md ADDED
@@ -0,0 +1,545 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Function Calling
2
+ Function calling is supported with the following models on OpenAI, Azure OpenAI
3
+
4
+ - gpt-4
5
+ - gpt-4-1106-preview
6
+ - gpt-4-0613
7
+ - gpt-3.5-turbo
8
+ - gpt-3.5-turbo-1106
9
+ - gpt-3.5-turbo-0613
10
+ - Non OpenAI LLMs (litellm adds the function call to the prompt for these llms)
11
+
12
+ In addition, parallel function calls is supported on the following models:
13
+ - gpt-4-1106-preview
14
+ - gpt-3.5-turbo-1106
15
+
16
+ ## Parallel Function calling
17
+ Parallel function calling is the model's ability to perform multiple function calls together, allowing the effects and results of these function calls to be resolved in parallel
18
+
19
+ ## Quick Start - gpt-3.5-turbo-1106
20
+ <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/Parallel_function_calling.ipynb">
21
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
22
+ </a>
23
+
24
+ In this example we define a single function `get_current_weather`.
25
+
26
+ - Step 1: Send the model the `get_current_weather` with the user question
27
+ - Step 2: Parse the output from the model response - Execute the `get_current_weather` with the model provided args
28
+ - Step 3: Send the model the output from running the `get_current_weather` function
29
+
30
+
31
+ ### Full Code - Parallel function calling with `gpt-3.5-turbo-1106`
32
+
33
+ ```python
34
+ import litellm
35
+ import json
36
+ # set openai api key
37
+ import os
38
+ os.environ['OPENAI_API_KEY'] = "" # litellm reads OPENAI_API_KEY from .env and sends the request
39
+
40
+ # Example dummy function hard coded to return the same weather
41
+ # In production, this could be your backend API or an external API
42
+ def get_current_weather(location, unit="fahrenheit"):
43
+ """Get the current weather in a given location"""
44
+ if "tokyo" in location.lower():
45
+ return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
46
+ elif "san francisco" in location.lower():
47
+ return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
48
+ elif "paris" in location.lower():
49
+ return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
50
+ else:
51
+ return json.dumps({"location": location, "temperature": "unknown"})
52
+
53
+
54
+ def test_parallel_function_call():
55
+ try:
56
+ # Step 1: send the conversation and available functions to the model
57
+ messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
58
+ tools = [
59
+ {
60
+ "type": "function",
61
+ "function": {
62
+ "name": "get_current_weather",
63
+ "description": "Get the current weather in a given location",
64
+ "parameters": {
65
+ "type": "object",
66
+ "properties": {
67
+ "location": {
68
+ "type": "string",
69
+ "description": "The city and state, e.g. San Francisco, CA",
70
+ },
71
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
72
+ },
73
+ "required": ["location"],
74
+ },
75
+ },
76
+ }
77
+ ]
78
+ response = litellm.completion(
79
+ model="gpt-3.5-turbo-1106",
80
+ messages=messages,
81
+ tools=tools,
82
+ tool_choice="auto", # auto is default, but we'll be explicit
83
+ )
84
+ print("\nFirst LLM Response:\n", response)
85
+ response_message = response.choices[0].message
86
+ tool_calls = response_message.tool_calls
87
+
88
+ print("\nLength of tool calls", len(tool_calls))
89
+
90
+ # Step 2: check if the model wanted to call a function
91
+ if tool_calls:
92
+ # Step 3: call the function
93
+ # Note: the JSON response may not always be valid; be sure to handle errors
94
+ available_functions = {
95
+ "get_current_weather": get_current_weather,
96
+ } # only one function in this example, but you can have multiple
97
+ messages.append(response_message) # extend conversation with assistant's reply
98
+
99
+ # Step 4: send the info for each function call and function response to the model
100
+ for tool_call in tool_calls:
101
+ function_name = tool_call.function.name
102
+ function_to_call = available_functions[function_name]
103
+ function_args = json.loads(tool_call.function.arguments)
104
+ function_response = function_to_call(
105
+ location=function_args.get("location"),
106
+ unit=function_args.get("unit"),
107
+ )
108
+ messages.append(
109
+ {
110
+ "tool_call_id": tool_call.id,
111
+ "role": "tool",
112
+ "name": function_name,
113
+ "content": function_response,
114
+ }
115
+ ) # extend conversation with function response
116
+ second_response = litellm.completion(
117
+ model="gpt-3.5-turbo-1106",
118
+ messages=messages,
119
+ ) # get a new response from the model where it can see the function response
120
+ print("\nSecond LLM response:\n", second_response)
121
+ return second_response
122
+ except Exception as e:
123
+ print(f"Error occurred: {e}")
124
+
125
+ test_parallel_function_call()
126
+ ```
127
+
128
+ ### Explanation - Parallel function calling
129
+ Below is an explanation of what is happening in the code snippet above for Parallel function calling with `gpt-3.5-turbo-1106`
130
+ ### Step1: litellm.completion() with `tools` set to `get_current_weather`
131
+ ```python
132
+ import litellm
133
+ import json
134
+ # set openai api key
135
+ import os
136
+ os.environ['OPENAI_API_KEY'] = "" # litellm reads OPENAI_API_KEY from .env and sends the request
137
+ # Example dummy function hard coded to return the same weather
138
+ # In production, this could be your backend API or an external API
139
+ def get_current_weather(location, unit="fahrenheit"):
140
+ """Get the current weather in a given location"""
141
+ if "tokyo" in location.lower():
142
+ return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
143
+ elif "san francisco" in location.lower():
144
+ return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
145
+ elif "paris" in location.lower():
146
+ return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
147
+ else:
148
+ return json.dumps({"location": location, "temperature": "unknown"})
149
+
150
+ messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
151
+ tools = [
152
+ {
153
+ "type": "function",
154
+ "function": {
155
+ "name": "get_current_weather",
156
+ "description": "Get the current weather in a given location",
157
+ "parameters": {
158
+ "type": "object",
159
+ "properties": {
160
+ "location": {
161
+ "type": "string",
162
+ "description": "The city and state, e.g. San Francisco, CA",
163
+ },
164
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
165
+ },
166
+ "required": ["location"],
167
+ },
168
+ },
169
+ }
170
+ ]
171
+
172
+ response = litellm.completion(
173
+ model="gpt-3.5-turbo-1106",
174
+ messages=messages,
175
+ tools=tools,
176
+ tool_choice="auto", # auto is default, but we'll be explicit
177
+ )
178
+ print("\nLLM Response1:\n", response)
179
+ response_message = response.choices[0].message
180
+ tool_calls = response.choices[0].message.tool_calls
181
+ ```
182
+
183
+ ##### Expected output
184
+ In the output you can see the model calls the function multiple times - for San Francisco, Tokyo, Paris
185
+ ```json
186
+ ModelResponse(
187
+ id='chatcmpl-8MHBKZ9t6bXuhBvUMzoKsfmmlv7xq',
188
+ choices=[
189
+ Choices(finish_reason='tool_calls',
190
+ index=0,
191
+ message=Message(content=None, role='assistant',
192
+ tool_calls=[
193
+ ChatCompletionMessageToolCall(id='call_DN6IiLULWZw7sobV6puCji1O', function=Function(arguments='{"location": "San Francisco", "unit": "celsius"}', name='get_current_weather'), type='function'),
194
+
195
+ ChatCompletionMessageToolCall(id='call_ERm1JfYO9AFo2oEWRmWUd40c', function=Function(arguments='{"location": "Tokyo", "unit": "celsius"}', name='get_current_weather'), type='function'),
196
+
197
+ ChatCompletionMessageToolCall(id='call_2lvUVB1y4wKunSxTenR0zClP', function=Function(arguments='{"location": "Paris", "unit": "celsius"}', name='get_current_weather'), type='function')
198
+ ]))
199
+ ],
200
+ created=1700319953,
201
+ model='gpt-3.5-turbo-1106',
202
+ object='chat.completion',
203
+ system_fingerprint='fp_eeff13170a',
204
+ usage={'completion_tokens': 77, 'prompt_tokens': 88, 'total_tokens': 165},
205
+ _response_ms=1177.372
206
+ )
207
+ ```
208
+
209
+ ### Step 2 - Parse the Model Response and Execute Functions
210
+ After sending the initial request, parse the model response to identify the function calls it wants to make. In this example, we expect three tool calls, each corresponding to a location (San Francisco, Tokyo, and Paris).
211
+
212
+ ```python
213
+ # Check if the model wants to call a function
214
+ if tool_calls:
215
+ # Execute the functions and prepare responses
216
+ available_functions = {
217
+ "get_current_weather": get_current_weather,
218
+ }
219
+
220
+ messages.append(response_message) # Extend conversation with assistant's reply
221
+
222
+ for tool_call in tool_calls:
223
+ print(f"\nExecuting tool call\n{tool_call}")
224
+ function_name = tool_call.function.name
225
+ function_to_call = available_functions[function_name]
226
+ function_args = json.loads(tool_call.function.arguments)
227
+ # calling the get_current_weather() function
228
+ function_response = function_to_call(
229
+ location=function_args.get("location"),
230
+ unit=function_args.get("unit"),
231
+ )
232
+ print(f"Result from tool call\n{function_response}\n")
233
+
234
+ # Extend conversation with function response
235
+ messages.append(
236
+ {
237
+ "tool_call_id": tool_call.id,
238
+ "role": "tool",
239
+ "name": function_name,
240
+ "content": function_response,
241
+ }
242
+ )
243
+
244
+ ```
245
+
246
+ ### Step 3 - Second litellm.completion() call
247
+ Once the functions are executed, send the model the information for each function call and its response. This allows the model to generate a new response considering the effects of the function calls.
248
+ ```python
249
+ second_response = litellm.completion(
250
+ model="gpt-3.5-turbo-1106",
251
+ messages=messages,
252
+ )
253
+ print("Second Response\n", second_response)
254
+ ```
255
+
256
+ #### Expected output
257
+ ```json
258
+ ModelResponse(
259
+ id='chatcmpl-8MHBLh1ldADBP71OrifKap6YfAd4w',
260
+ choices=[
261
+ Choices(finish_reason='stop', index=0,
262
+ message=Message(content="The current weather in San Francisco is 72Β°F, in Tokyo it's 10Β°C, and in Paris it's 22Β°C.", role='assistant'))
263
+ ],
264
+ created=1700319955,
265
+ model='gpt-3.5-turbo-1106',
266
+ object='chat.completion',
267
+ system_fingerprint='fp_eeff13170a',
268
+ usage={'completion_tokens': 28, 'prompt_tokens': 169, 'total_tokens': 197},
269
+ _response_ms=1032.431
270
+ )
271
+ ```
272
+
273
+ ## Parallel Function Calling - Azure OpenAI
274
+ ```python
275
+ # set Azure env variables
276
+ import os
277
+ os.environ['AZURE_API_KEY'] = "" # litellm reads AZURE_API_KEY from .env and sends the request
278
+ os.environ['AZURE_API_BASE'] = "https://openai-gpt-4-test-v-1.openai.azure.com/"
279
+ os.environ['AZURE_API_VERSION'] = "2023-07-01-preview"
280
+
281
+ import litellm
282
+ import json
283
+ # Example dummy function hard coded to return the same weather
284
+ # In production, this could be your backend API or an external API
285
+ def get_current_weather(location, unit="fahrenheit"):
286
+ """Get the current weather in a given location"""
287
+ if "tokyo" in location.lower():
288
+ return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
289
+ elif "san francisco" in location.lower():
290
+ return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
291
+ elif "paris" in location.lower():
292
+ return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
293
+ else:
294
+ return json.dumps({"location": location, "temperature": "unknown"})
295
+
296
+ ## Step 1: send the conversation and available functions to the model
297
+ messages = [{"role": "user", "content": "What's the weather like in San Francisco, Tokyo, and Paris?"}]
298
+ tools = [
299
+ {
300
+ "type": "function",
301
+ "function": {
302
+ "name": "get_current_weather",
303
+ "description": "Get the current weather in a given location",
304
+ "parameters": {
305
+ "type": "object",
306
+ "properties": {
307
+ "location": {
308
+ "type": "string",
309
+ "description": "The city and state, e.g. San Francisco, CA",
310
+ },
311
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
312
+ },
313
+ "required": ["location"],
314
+ },
315
+ },
316
+ }
317
+ ]
318
+
319
+ response = litellm.completion(
320
+ model="azure/chatgpt-functioncalling", # model = azure/<your-azure-deployment-name>
321
+ messages=messages,
322
+ tools=tools,
323
+ tool_choice="auto", # auto is default, but we'll be explicit
324
+ )
325
+ print("\nLLM Response1:\n", response)
326
+ response_message = response.choices[0].message
327
+ tool_calls = response.choices[0].message.tool_calls
328
+ print("\nTool Choice:\n", tool_calls)
329
+
330
+ ## Step 2 - Parse the Model Response and Execute Functions
331
+ # Check if the model wants to call a function
332
+ if tool_calls:
333
+ # Execute the functions and prepare responses
334
+ available_functions = {
335
+ "get_current_weather": get_current_weather,
336
+ }
337
+
338
+ messages.append(response_message) # Extend conversation with assistant's reply
339
+
340
+ for tool_call in tool_calls:
341
+ print(f"\nExecuting tool call\n{tool_call}")
342
+ function_name = tool_call.function.name
343
+ function_to_call = available_functions[function_name]
344
+ function_args = json.loads(tool_call.function.arguments)
345
+ # calling the get_current_weather() function
346
+ function_response = function_to_call(
347
+ location=function_args.get("location"),
348
+ unit=function_args.get("unit"),
349
+ )
350
+ print(f"Result from tool call\n{function_response}\n")
351
+
352
+ # Extend conversation with function response
353
+ messages.append(
354
+ {
355
+ "tool_call_id": tool_call.id,
356
+ "role": "tool",
357
+ "name": function_name,
358
+ "content": function_response,
359
+ }
360
+ )
361
+
362
+ ## Step 3 - Second litellm.completion() call
363
+ second_response = litellm.completion(
364
+ model="azure/chatgpt-functioncalling",
365
+ messages=messages,
366
+ )
367
+ print("Second Response\n", second_response)
368
+ print("Second Response Message\n", second_response.choices[0].message.content)
369
+
370
+ ```
371
+
372
+ ## Deprecated - Function Calling with `completion(functions=functions)`
373
+ ```python
374
+ import os, litellm
375
+ from litellm import completion
376
+
377
+ os.environ['OPENAI_API_KEY'] = ""
378
+
379
+ messages = [
380
+ {"role": "user", "content": "What is the weather like in Boston?"}
381
+ ]
382
+
383
+ # python function that will get executed
384
+ def get_current_weather(location):
385
+ if location == "Boston, MA":
386
+ return "The weather is 12F"
387
+
388
+ # JSON Schema to pass to OpenAI
389
+ functions = [
390
+ {
391
+ "name": "get_current_weather",
392
+ "description": "Get the current weather in a given location",
393
+ "parameters": {
394
+ "type": "object",
395
+ "properties": {
396
+ "location": {
397
+ "type": "string",
398
+ "description": "The city and state, e.g. San Francisco, CA"
399
+ },
400
+ "unit": {
401
+ "type": "string",
402
+ "enum": ["celsius", "fahrenheit"]
403
+ }
404
+ },
405
+ "required": ["location"]
406
+ }
407
+ }
408
+ ]
409
+
410
+ response = completion(model="gpt-3.5-turbo-0613", messages=messages, functions=functions)
411
+ print(response)
412
+ ```
413
+
414
+ ## litellm.function_to_dict - Convert Functions to dictionary for OpenAI function calling
415
+ `function_to_dict` allows you to pass a function docstring and produce a dictionary usable for OpenAI function calling
416
+
417
+ ### Using `function_to_dict`
418
+ 1. Define your function `get_current_weather`
419
+ 2. Add a docstring to your function `get_current_weather`
420
+ 3. Pass the function to `litellm.utils.function_to_dict` to get the dictionary for OpenAI function calling
421
+
422
+ ```python
423
+ # function with docstring
424
+ def get_current_weather(location: str, unit: str):
425
+ """Get the current weather in a given location
426
+
427
+ Parameters
428
+ ----------
429
+ location : str
430
+ The city and state, e.g. San Francisco, CA
431
+ unit : {'celsius', 'fahrenheit'}
432
+ Temperature unit
433
+
434
+ Returns
435
+ -------
436
+ str
437
+ a sentence indicating the weather
438
+ """
439
+ if location == "Boston, MA":
440
+ return "The weather is 12F"
441
+
442
+ # use litellm.utils.function_to_dict to convert function to dict
443
+ function_json = litellm.utils.function_to_dict(get_current_weather)
444
+ print(function_json)
445
+ ```
446
+
447
+ #### Output from function_to_dict
448
+ ```json
449
+ {
450
+ 'name': 'get_current_weather',
451
+ 'description': 'Get the current weather in a given location',
452
+ 'parameters': {
453
+ 'type': 'object',
454
+ 'properties': {
455
+ 'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'},
456
+ 'unit': {'type': 'string', 'description': 'Temperature unit', 'enum': "['fahrenheit', 'celsius']"}
457
+ },
458
+ 'required': ['location', 'unit']
459
+ }
460
+ }
461
+ ```
462
+
463
+ ### Using function_to_dict with Function calling
464
+ ```python
465
+ import os, litellm
466
+ from litellm import completion
467
+
468
+ os.environ['OPENAI_API_KEY'] = ""
469
+
470
+ messages = [
471
+ {"role": "user", "content": "What is the weather like in Boston?"}
472
+ ]
473
+
474
+ def get_current_weather(location: str, unit: str):
475
+ """Get the current weather in a given location
476
+
477
+ Parameters
478
+ ----------
479
+ location : str
480
+ The city and state, e.g. San Francisco, CA
481
+ unit : str {'celsius', 'fahrenheit'}
482
+ Temperature unit
483
+
484
+ Returns
485
+ -------
486
+ str
487
+ a sentence indicating the weather
488
+ """
489
+ if location == "Boston, MA":
490
+ return "The weather is 12F"
491
+
492
+ functions = [litellm.utils.function_to_dict(get_current_weather)]
493
+
494
+ response = completion(model="gpt-3.5-turbo-0613", messages=messages, functions=functions)
495
+ print(response)
496
+ ```
497
+
498
+ ## Function calling for Non-OpenAI LLMs
499
+
500
+ ### Adding Function to prompt
501
+ For Non OpenAI LLMs LiteLLM allows you to add the function to the prompt set: `litellm.add_function_to_prompt = True`
502
+
503
+ #### Usage
504
+ ```python
505
+ import os, litellm
506
+ from litellm import completion
507
+
508
+ # IMPORTANT - Set this to TRUE to add the function to the prompt for Non OpenAI LLMs
509
+ litellm.add_function_to_prompt = True # set add_function_to_prompt for Non OpenAI LLMs
510
+
511
+ os.environ['ANTHROPIC_API_KEY'] = ""
512
+
513
+ messages = [
514
+ {"role": "user", "content": "What is the weather like in Boston?"}
515
+ ]
516
+
517
+ def get_current_weather(location):
518
+ if location == "Boston, MA":
519
+ return "The weather is 12F"
520
+
521
+ functions = [
522
+ {
523
+ "name": "get_current_weather",
524
+ "description": "Get the current weather in a given location",
525
+ "parameters": {
526
+ "type": "object",
527
+ "properties": {
528
+ "location": {
529
+ "type": "string",
530
+ "description": "The city and state, e.g. San Francisco, CA"
531
+ },
532
+ "unit": {
533
+ "type": "string",
534
+ "enum": ["celsius", "fahrenheit"]
535
+ }
536
+ },
537
+ "required": ["location"]
538
+ }
539
+ }
540
+ ]
541
+
542
+ response = completion(model="claude-2", messages=messages, functions=functions)
543
+ print(response)
544
+ ```
545
+
docs/my-website/docs/completion/input.md ADDED
@@ -0,0 +1,582 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Tabs from '@theme/Tabs';
2
+ import TabItem from '@theme/TabItem';
3
+
4
+ # Input Params
5
+
6
+ ## Common Params
7
+ LiteLLM accepts and translates the [OpenAI Chat Completion params](https://platform.openai.com/docs/api-reference/chat/create) across all providers.
8
+
9
+ ### Usage
10
+ ```python
11
+ import litellm
12
+
13
+ # set env variables
14
+ os.environ["OPENAI_API_KEY"] = "your-openai-key"
15
+
16
+ ## SET MAX TOKENS - via completion()
17
+ response = litellm.completion(
18
+ model="gpt-3.5-turbo",
19
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
20
+ max_tokens=10
21
+ )
22
+
23
+ print(response)
24
+ ```
25
+
26
+ ### Translated OpenAI params
27
+ This is a list of openai params we translate across providers.
28
+
29
+ This list is constantly being updated.
30
+
31
+ | Provider | temperature | max_tokens | top_p | stream | stop | n | presence_penalty | frequency_penalty | functions | function_call |
32
+ |---|---|---|---|---|---|---|---|---|---|---|
33
+ |Anthropic| βœ… | βœ… | βœ… | βœ… | βœ… | | | | | |
34
+ |OpenAI| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |
35
+ |Replicate | βœ… | βœ… | βœ… | βœ… | βœ… | | | | | |
36
+ |Anyscale | βœ… | βœ… | βœ… | βœ… |
37
+ |Cohere| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | | |
38
+ |Huggingface| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | | | | |
39
+ |Openrouter| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… |
40
+ |AI21| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | | |
41
+ |VertexAI| βœ… | βœ… | | βœ… | | | | | | |
42
+ |Bedrock| βœ… | βœ… | βœ… | βœ… | βœ… | | | | | |
43
+ |Sagemaker| βœ… | βœ… (only `jumpstart llama2`) | | βœ… | | | | | | |
44
+ |TogetherAI| βœ… | βœ… | βœ… | βœ… | βœ… | | | | | |
45
+ |AlephAlpha| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | | | | |
46
+ |Palm| βœ… | βœ… | βœ… | βœ… | βœ… | βœ… | | | | |
47
+ |NLP Cloud| βœ… | βœ… | βœ… | βœ… | βœ… | | | | | |
48
+ |Petals| βœ… | βœ… | | βœ… | | | | | | |
49
+ |Ollama| βœ… | βœ… | βœ… | βœ… | βœ… | | | βœ… | | |
50
+
51
+ :::note
52
+
53
+ By default, LiteLLM raises an exception if the openai param being passed in isn't supported.
54
+
55
+ To drop the param instead, set `litellm.drop_params = True`.
56
+
57
+ **For function calling:**
58
+
59
+ Add to prompt for non-openai models, set: `litellm.add_function_to_prompt = True`.
60
+ :::
61
+
62
+ ## Input Params
63
+
64
+ ```python
65
+ def completion(
66
+ model: str,
67
+ messages: List = [],
68
+ # Optional OpenAI params
69
+ temperature: Optional[float] = None,
70
+ top_p: Optional[float] = None,
71
+ n: Optional[int] = None,
72
+ stream: Optional[bool] = None,
73
+ stop=None,
74
+ max_tokens: Optional[float] = None,
75
+ presence_penalty: Optional[float] = None,
76
+ frequency_penalty: Optional[float]=None,
77
+ logit_bias: dict = {},
78
+ user: str = "",
79
+ deployment_id = None,
80
+ request_timeout: Optional[int] = None,
81
+ response_format: Optional[dict] = None,
82
+ seed: Optional[int] = None,
83
+ tools: Optional[List] = None,
84
+ tool_choice: Optional[str] = None,
85
+ functions: List = [], # soon to be deprecated
86
+ function_call: str = "", # soon to be deprecated
87
+
88
+ # Optional LiteLLM params
89
+ api_base: Optional[str] = None,
90
+ api_version: Optional[str] = None,
91
+ api_key: Optional[str] = None,
92
+ num_retries: Optional[int] = None, # set to retry a model if an APIError, TimeoutError, or ServiceUnavailableError occurs
93
+ context_window_fallback_dict: Optional[dict] = None, # mapping of model to use if call fails due to context window error
94
+ fallbacks: Optional[list] = None, # pass in a list of api_base,keys, etc.
95
+ metadata: Optional[dict] = None # additional call metadata, passed to logging integrations / custom callbacks
96
+
97
+
98
+ **kwargs,
99
+ ) -> ModelResponse:
100
+ ```
101
+ ### Required Fields
102
+
103
+ - `model`: *string* - ID of the model to use. Refer to the model endpoint compatibility table for details on which models work with the Chat API.
104
+
105
+ - `messages`: *array* - A list of messages comprising the conversation so far.
106
+
107
+ #### Properties of `messages`
108
+ *Note* - Each message in the array contains the following properties:
109
+
110
+ - `role`: *string* - The role of the message's author. Roles can be: system, user, assistant, or function.
111
+
112
+ - `content`: *string or null* - The contents of the message. It is required for all messages, but may be null for assistant messages with function calls.
113
+
114
+ - `name`: *string (optional)* - The name of the author of the message. It is required if the role is "function". The name should match the name of the function represented in the content. It can contain characters (a-z, A-Z, 0-9), and underscores, with a maximum length of 64 characters.
115
+
116
+ - `function_call`: *object (optional)* - The name and arguments of a function that should be called, as generated by the model.
117
+
118
+
119
+
120
+ ## Optional Fields
121
+
122
+ `temperature`: *number or null (optional)* - The sampling temperature to be used, between 0 and 2. Higher values like 0.8 produce more random outputs, while lower values like 0.2 make outputs more focused and deterministic.
123
+
124
+ - `top_p`: *number or null (optional)* - An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
125
+
126
+ - `n`: *integer or null (optional)* - The number of chat completion choices to generate for each input message.
127
+
128
+ - `stream`: *boolean or null (optional)* - If set to true, it sends partial message deltas. Tokens will be sent as they become available, with the stream terminated by a [DONE] message.
129
+
130
+ - `stop`: *string/ array/ null (optional)* - Up to 4 sequences where the API will stop generating further tokens.
131
+
132
+ - `max_tokens`: *integer (optional)* - The maximum number of tokens to generate in the chat completion.
133
+
134
+ - `presence_penalty`: *number or null (optional)* - It is used to penalize new tokens based on their existence in the text so far.
135
+
136
+ - `response_format`: *object (optional)* - An object specifying the format that the model must output.
137
+
138
+ - Setting to `{ "type": "json_object" }` enables JSON mode, which guarantees the message the model generates is valid JSON.
139
+
140
+ - Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
141
+
142
+ - `seed`: *integer or null (optional)* - This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the `system_fingerprint` response parameter to monitor changes in the backend.
143
+
144
+ - `tools`: *array (optional)* - A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for.
145
+
146
+ - `type`: *string* - The type of the tool. Currently, only function is supported.
147
+
148
+ - `function`: *object* - Required.
149
+
150
+ - `tool_choice`: *string or object (optional)* - Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"type: "function", "function": {"name": "my_function"}} forces the model to call that function.
151
+
152
+ - `none` is the default when no functions are present. `auto` is the default if functions are present.
153
+
154
+ - `frequency_penalty`: *number or null (optional)* - It is used to penalize new tokens based on their frequency in the text so far.
155
+
156
+ - `logit_bias`: *map (optional)* - Used to modify the probability of specific tokens appearing in the completion.
157
+
158
+ - `user`: *string (optional)* - A unique identifier representing your end-user. This can help OpenAI to monitor and detect abuse.
159
+
160
+ - `timeout`: *int (optional)* - Timeout in seconds for completion requests (Defaults to 600 seconds)
161
+
162
+ #### Deprecated Params
163
+ - `functions`: *array* - A list of functions that the model may use to generate JSON inputs. Each function should have the following properties:
164
+
165
+ - `name`: *string* - The name of the function to be called. It should contain a-z, A-Z, 0-9, underscores and dashes, with a maximum length of 64 characters.
166
+
167
+ - `description`: *string (optional)* - A description explaining what the function does. It helps the model to decide when and how to call the function.
168
+
169
+ - `parameters`: *object* - The parameters that the function accepts, described as a JSON Schema object.
170
+
171
+ - `function_call`: *string or object (optional)* - Controls how the model responds to function calls.
172
+
173
+
174
+ #### litellm-specific params
175
+
176
+ - `api_base`: *string (optional)* - The api endpoint you want to call the model with
177
+
178
+ - `api_version`: *string (optional)* - (Azure-specific) the api version for the call
179
+
180
+ - `num_retries`: *int (optional)* - The number of times to retry the API call if an APIError, TimeoutError or ServiceUnavailableError occurs
181
+
182
+ - `context_window_fallback_dict`: *dict (optional)* - A mapping of model to use if call fails due to context window error
183
+
184
+ - `fallbacks`: *list (optional)* - A list of model names + params to be used, in case the initial call fails
185
+
186
+ - `metadata`: *dict (optional)* - Any additional data you want to be logged when the call is made (sent to logging integrations, eg. promptlayer and accessible via custom callback function)
187
+
188
+ ## Provider-specific Params
189
+ Providers might offer params not supported by OpenAI (e.g. top_k). You can pass those in 2 ways:
190
+ - via completion(): We'll pass the non-openai param, straight to the provider as part of the request body.
191
+ - e.g. `completion(model="claude-instant-1", top_k=3)`
192
+ - via provider-specific config variable (e.g. `litellm.OpenAIConfig()`).
193
+
194
+ <Tabs>
195
+ <TabItem value="openai" label="OpenAI">
196
+
197
+ ```python
198
+ import litellm, os
199
+
200
+ # set env variables
201
+ os.environ["OPENAI_API_KEY"] = "your-openai-key"
202
+
203
+ ## SET MAX TOKENS - via completion()
204
+ response_1 = litellm.completion(
205
+ model="gpt-3.5-turbo",
206
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
207
+ max_tokens=10
208
+ )
209
+
210
+ response_1_text = response_1.choices[0].message.content
211
+
212
+ ## SET MAX TOKENS - via config
213
+ litellm.OpenAIConfig(max_tokens=10)
214
+
215
+ response_2 = litellm.completion(
216
+ model="gpt-3.5-turbo",
217
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
218
+ )
219
+
220
+ response_2_text = response_2.choices[0].message.content
221
+
222
+ ## TEST OUTPUT
223
+ assert len(response_2_text) > len(response_1_text)
224
+ ```
225
+
226
+ </TabItem>
227
+ <TabItem value="openai-text" label="OpenAI Text Completion">
228
+
229
+ ```python
230
+ import litellm, os
231
+
232
+ # set env variables
233
+ os.environ["OPENAI_API_KEY"] = "your-openai-key"
234
+
235
+
236
+ ## SET MAX TOKENS - via completion()
237
+ response_1 = litellm.completion(
238
+ model="text-davinci-003",
239
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
240
+ max_tokens=10
241
+ )
242
+
243
+ response_1_text = response_1.choices[0].message.content
244
+
245
+ ## SET MAX TOKENS - via config
246
+ litellm.OpenAITextCompletionConfig(max_tokens=10)
247
+ response_2 = litellm.completion(
248
+ model="text-davinci-003",
249
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
250
+ )
251
+
252
+ response_2_text = response_2.choices[0].message.content
253
+
254
+ ## TEST OUTPUT
255
+ assert len(response_2_text) > len(response_1_text)
256
+ ```
257
+
258
+ </TabItem>
259
+ <TabItem value="azure-openai" label="Azure OpenAI">
260
+
261
+ ```python
262
+ import litellm, os
263
+
264
+ # set env variables
265
+ os.environ["AZURE_API_BASE"] = "your-azure-api-base"
266
+ os.environ["AZURE_API_TYPE"] = "azure" # [OPTIONAL]
267
+ os.environ["AZURE_API_VERSION"] = "2023-07-01-preview" # [OPTIONAL]
268
+
269
+ ## SET MAX TOKENS - via completion()
270
+ response_1 = litellm.completion(
271
+ model="azure/chatgpt-v-2",
272
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
273
+ max_tokens=10
274
+ )
275
+
276
+ response_1_text = response_1.choices[0].message.content
277
+
278
+ ## SET MAX TOKENS - via config
279
+ litellm.AzureOpenAIConfig(max_tokens=10)
280
+ response_2 = litellm.completion(
281
+ model="azure/chatgpt-v-2",
282
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
283
+ )
284
+
285
+ response_2_text = response_2.choices[0].message.content
286
+
287
+ ## TEST OUTPUT
288
+ assert len(response_2_text) > len(response_1_text)
289
+ ```
290
+
291
+ </TabItem>
292
+ <TabItem value="anthropic" label="Anthropic">
293
+
294
+ ```python
295
+ import litellm, os
296
+
297
+ # set env variables
298
+ os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
299
+
300
+ ## SET MAX TOKENS - via completion()
301
+ response_1 = litellm.completion(
302
+ model="claude-instant-1",
303
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
304
+ max_tokens=10
305
+ )
306
+
307
+ response_1_text = response_1.choices[0].message.content
308
+
309
+ ## SET MAX TOKENS - via config
310
+ litellm.AnthropicConfig(max_tokens_to_sample=200)
311
+ response_2 = litellm.completion(
312
+ model="claude-instant-1",
313
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
314
+ )
315
+
316
+ response_2_text = response_2.choices[0].message.content
317
+
318
+ ## TEST OUTPUT
319
+ assert len(response_2_text) > len(response_1_text)
320
+ ```
321
+
322
+ </TabItem>
323
+
324
+ <TabItem value="huggingface" label="Huggingface">
325
+
326
+ ```python
327
+ import litellm, os
328
+
329
+ # set env variables
330
+ os.environ["HUGGINGFACE_API_KEY"] = "your-huggingface-key" #[OPTIONAL]
331
+
332
+ ## SET MAX TOKENS - via completion()
333
+ response_1 = litellm.completion(
334
+ model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
335
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
336
+ api_base="https://your-huggingface-api-endpoint",
337
+ max_tokens=10
338
+ )
339
+
340
+ response_1_text = response_1.choices[0].message.content
341
+
342
+ ## SET MAX TOKENS - via config
343
+ litellm.HuggingfaceConfig(max_new_tokens=200)
344
+ response_2 = litellm.completion(
345
+ model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
346
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
347
+ api_base="https://your-huggingface-api-endpoint"
348
+ )
349
+
350
+ response_2_text = response_2.choices[0].message.content
351
+
352
+ ## TEST OUTPUT
353
+ assert len(response_2_text) > len(response_1_text)
354
+ ```
355
+
356
+ </TabItem>
357
+
358
+ <TabItem value="together_ai" label="TogetherAI">
359
+
360
+
361
+ ```python
362
+ import litellm, os
363
+
364
+ # set env variables
365
+ os.environ["TOGETHERAI_API_KEY"] = "your-togetherai-key"
366
+
367
+ ## SET MAX TOKENS - via completion()
368
+ response_1 = litellm.completion(
369
+ model="together_ai/togethercomputer/llama-2-70b-chat",
370
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
371
+ max_tokens=10
372
+ )
373
+
374
+ response_1_text = response_1.choices[0].message.content
375
+
376
+ ## SET MAX TOKENS - via config
377
+ litellm.TogetherAIConfig(max_tokens_to_sample=200)
378
+ response_2 = litellm.completion(
379
+ model="together_ai/togethercomputer/llama-2-70b-chat",
380
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
381
+ )
382
+
383
+ response_2_text = response_2.choices[0].message.content
384
+
385
+ ## TEST OUTPUT
386
+ assert len(response_2_text) > len(response_1_text)
387
+ ```
388
+
389
+ </TabItem>
390
+
391
+ <TabItem value="ollama" label="Ollama">
392
+
393
+ ```python
394
+ import litellm, os
395
+
396
+ ## SET MAX TOKENS - via completion()
397
+ response_1 = litellm.completion(
398
+ model="ollama/llama2",
399
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
400
+ max_tokens=10
401
+ )
402
+
403
+ response_1_text = response_1.choices[0].message.content
404
+
405
+ ## SET MAX TOKENS - via config
406
+ litellm.OllamConfig(num_predict=200)
407
+ response_2 = litellm.completion(
408
+ model="ollama/llama2",
409
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
410
+ )
411
+
412
+ response_2_text = response_2.choices[0].message.content
413
+
414
+ ## TEST OUTPUT
415
+ assert len(response_2_text) > len(response_1_text)
416
+ ```
417
+
418
+ </TabItem>
419
+
420
+ <TabItem value="replicate" label="Replicate">
421
+
422
+ ```python
423
+ import litellm, os
424
+
425
+ # set env variables
426
+ os.environ["REPLICATE_API_KEY"] = "your-replicate-key"
427
+
428
+ ## SET MAX TOKENS - via completion()
429
+ response_1 = litellm.completion(
430
+ model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
431
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
432
+ max_tokens=10
433
+ )
434
+
435
+ response_1_text = response_1.choices[0].message.content
436
+
437
+ ## SET MAX TOKENS - via config
438
+ litellm.ReplicateConfig(max_new_tokens=200)
439
+ response_2 = litellm.completion(
440
+ model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
441
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
442
+ )
443
+
444
+ response_2_text = response_2.choices[0].message.content
445
+
446
+ ## TEST OUTPUT
447
+ assert len(response_2_text) > len(response_1_text)
448
+ ```
449
+
450
+ </TabItem>
451
+
452
+ <TabItem value="petals" label="Petals">
453
+
454
+
455
+ ```python
456
+ import litellm
457
+
458
+ ## SET MAX TOKENS - via completion()
459
+ response_1 = litellm.completion(
460
+ model="petals/petals-team/StableBeluga2",
461
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
462
+ api_base="https://chat.petals.dev/api/v1/generate",
463
+ max_tokens=10
464
+ )
465
+
466
+ response_1_text = response_1.choices[0].message.content
467
+
468
+ ## SET MAX TOKENS - via config
469
+ litellm.PetalsConfig(max_new_tokens=10)
470
+ response_2 = litellm.completion(
471
+ model="petals/petals-team/StableBeluga2",
472
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
473
+ api_base="https://chat.petals.dev/api/v1/generate",
474
+ )
475
+
476
+ response_2_text = response_2.choices[0].message.content
477
+
478
+ ## TEST OUTPUT
479
+ assert len(response_2_text) > len(response_1_text)
480
+ ```
481
+
482
+ </TabItem>
483
+
484
+ <TabItem value="palm" label="Palm">
485
+
486
+ ```python
487
+ import litellm, os
488
+
489
+ # set env variables
490
+ os.environ["PALM_API_KEY"] = "your-palm-key"
491
+
492
+ ## SET MAX TOKENS - via completion()
493
+ response_1 = litellm.completion(
494
+ model="palm/chat-bison",
495
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
496
+ max_tokens=10
497
+ )
498
+
499
+ response_1_text = response_1.choices[0].message.content
500
+
501
+ ## SET MAX TOKENS - via config
502
+ litellm.PalmConfig(maxOutputTokens=10)
503
+ response_2 = litellm.completion(
504
+ model="palm/chat-bison",
505
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
506
+ )
507
+
508
+ response_2_text = response_2.choices[0].message.content
509
+
510
+ ## TEST OUTPUT
511
+ assert len(response_2_text) > len(response_1_text)
512
+ ```
513
+ </TabItem>
514
+
515
+ <TabItem value="ai21" label="AI21">
516
+
517
+ ```python
518
+ import litellm, os
519
+
520
+ # set env variables
521
+ os.environ["AI21_API_KEY"] = "your-ai21-key"
522
+
523
+ ## SET MAX TOKENS - via completion()
524
+ response_1 = litellm.completion(
525
+ model="j2-mid",
526
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
527
+ max_tokens=10
528
+ )
529
+
530
+ response_1_text = response_1.choices[0].message.content
531
+
532
+ ## SET MAX TOKENS - via config
533
+ litellm.AI21Config(maxOutputTokens=10)
534
+ response_2 = litellm.completion(
535
+ model="j2-mid",
536
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
537
+ )
538
+
539
+ response_2_text = response_2.choices[0].message.content
540
+
541
+ ## TEST OUTPUT
542
+ assert len(response_2_text) > len(response_1_text)
543
+ ```
544
+
545
+ </TabItem>
546
+
547
+ <TabItem value="cohere" label="Cohere">
548
+
549
+ ```python
550
+ import litellm, os
551
+
552
+ # set env variables
553
+ os.environ["COHERE_API_KEY"] = "your-cohere-key"
554
+
555
+ ## SET MAX TOKENS - via completion()
556
+ response_1 = litellm.completion(
557
+ model="command-nightly",
558
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
559
+ max_tokens=10
560
+ )
561
+
562
+ response_1_text = response_1.choices[0].message.content
563
+
564
+ ## SET MAX TOKENS - via config
565
+ litellm.CohereConfig(max_tokens=200)
566
+ response_2 = litellm.completion(
567
+ model="command-nightly",
568
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
569
+ )
570
+
571
+ response_2_text = response_2.choices[0].message.content
572
+
573
+ ## TEST OUTPUT
574
+ assert len(response_2_text) > len(response_1_text)
575
+ ```
576
+
577
+ </TabItem>
578
+
579
+ </Tabs>
580
+
581
+
582
+ [**Check out the tutorial!**](../tutorials/provider_specific_params.md)
docs/my-website/docs/completion/message_trimming.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Trimming Input Messages
2
+ **Use litellm.trim_messages() to ensure messages does not exceed a model's token limit or specified `max_tokens`**
3
+
4
+ ## Usage
5
+ ```python
6
+ from litellm import completion
7
+ from litellm.utils import trim_messages
8
+
9
+ response = completion(
10
+ model=model,
11
+ messages=trim_messages(messages, model) # trim_messages ensures tokens(messages) < max_tokens(model)
12
+ )
13
+ ```
14
+
15
+ ## Usage - set max_tokens
16
+ ```python
17
+ from litellm import completion
18
+ from litellm.utils import trim_messages
19
+
20
+ response = completion(
21
+ model=model,
22
+ messages=trim_messages(messages, model, max_tokens=10), # trim_messages ensures tokens(messages) < max_tokens
23
+ )
24
+ ```
25
+
26
+ ## Parameters
27
+
28
+ The function uses the following parameters:
29
+
30
+ - `messages`:[Required] This should be a list of input messages
31
+
32
+ - `model`:[Optional] This is the LiteLLM model being used. This parameter is optional, as you can alternatively specify the `max_tokens` parameter.
33
+
34
+ - `max_tokens`:[Optional] This is an int, manually set upper limit on messages
35
+
36
+ - `trim_ratio`:[Optional] This represents the target ratio of tokens to use following trimming. It's default value is 0.75, which implies that messages will be trimmed to utilise about 75%
docs/my-website/docs/completion/mock_requests.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Mock Completion() Responses - Save Testing Costs πŸ’°
2
+
3
+ For testing purposes, you can use `completion()` with `mock_response` to mock calling the completion endpoint.
4
+
5
+ This will return a response object with a default response (works for streaming as well), without calling the LLM APIs.
6
+
7
+ ## quick start
8
+ ```python
9
+ from litellm import completion
10
+
11
+ model = "gpt-3.5-turbo"
12
+ messages = [{"role":"user", "content":"This is a test request"}]
13
+
14
+ completion(model=model, messages=messages, mock_response="It's simple to use and easy to get started")
15
+ ```
16
+
17
+ ## streaming
18
+
19
+ ```python
20
+ from litellm import completion
21
+ model = "gpt-3.5-turbo"
22
+ messages = [{"role": "user", "content": "Hey, I'm a mock request"}]
23
+ response = completion(model=model, messages=messages, stream=True, mock_response="It's simple to use and easy to get started")
24
+ for chunk in response:
25
+ print(chunk) # {'choices': [{'delta': {'role': 'assistant', 'content': 'Thi'}, 'finish_reason': None}]}
26
+ complete_response += chunk["choices"][0]["delta"]["content"]
27
+ ```
28
+
29
+ ## (Non-streaming) Mock Response Object
30
+
31
+ ```json
32
+ {
33
+ "choices": [
34
+ {
35
+ "finish_reason": "stop",
36
+ "index": 0,
37
+ "message": {
38
+ "content": "This is a mock request",
39
+ "role": "assistant",
40
+ "logprobs": null
41
+ }
42
+ }
43
+ ],
44
+ "created": 1694459929.4496052,
45
+ "model": "MockResponse",
46
+ "usage": {
47
+ "prompt_tokens": null,
48
+ "completion_tokens": null,
49
+ "total_tokens": null
50
+ }
51
+ }
52
+ ```
53
+
54
+ ## Building a pytest function using `completion` with `mock_response`
55
+
56
+ ```python
57
+ from litellm import completion
58
+ import pytest
59
+
60
+ def test_completion_openai():
61
+ try:
62
+ response = completion(
63
+ model="gpt-3.5-turbo",
64
+ messages=[{"role":"user", "content":"Why is LiteLLM amazing?"}],
65
+ mock_response="LiteLLM is awesome"
66
+ )
67
+ # Add any assertions here to check the response
68
+ print(response)
69
+ assert(response['choices'][0]['message']['content'] == "LiteLLM is awesome")
70
+ except Exception as e:
71
+ pytest.fail(f"Error occurred: {e}")
72
+ ```
docs/my-website/docs/completion/model_alias.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Alias
2
+
3
+ The model name you show an end-user might be different from the one you pass to LiteLLM - e.g. Displaying `GPT-3.5` while calling `gpt-3.5-turbo-16k` on the backend.
4
+
5
+ LiteLLM simplifies this by letting you pass in a model alias mapping.
6
+
7
+ # expected format
8
+
9
+ ```python
10
+ litellm.model_alias_map = {
11
+ # a dictionary containing a mapping of the alias string to the actual litellm model name string
12
+ "model_alias": "litellm_model_name"
13
+ }
14
+ ```
15
+
16
+ # usage
17
+
18
+ ### Relevant Code
19
+ ```python
20
+ model_alias_map = {
21
+ "GPT-3.5": "gpt-3.5-turbo-16k",
22
+ "llama2": "replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf"
23
+ }
24
+
25
+ litellm.model_alias_map = model_alias_map
26
+ ```
27
+
28
+ ### Complete Code
29
+ ```python
30
+ import litellm
31
+ from litellm import completion
32
+
33
+
34
+ ## set ENV variables
35
+ os.environ["OPENAI_API_KEY"] = "openai key"
36
+ os.environ["REPLICATE_API_KEY"] = "cohere key"
37
+
38
+ ## set model alias map
39
+ model_alias_map = {
40
+ "GPT-3.5": "gpt-3.5-turbo-16k",
41
+ "llama2": "replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf"
42
+ }
43
+
44
+ litellm.model_alias_map = model_alias_map
45
+
46
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
47
+
48
+ # call "gpt-3.5-turbo-16k"
49
+ response = completion(model="GPT-3.5", messages=messages)
50
+
51
+ # call replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca1...
52
+ response = completion("llama2", messages)
53
+ ```
docs/my-website/docs/completion/multiple_deployments.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Multiple Deployments
2
+
3
+ If you have multiple deployments of the same model, you can pass the list of deployments, and LiteLLM will return the first result.
4
+
5
+ ## Quick Start
6
+
7
+ Multiple providers offer Mistral-7B-Instruct.
8
+
9
+ Here's how you can use litellm to return the first result:
10
+
11
+ ```python
12
+ from litellm import completion
13
+
14
+ messages=[{"role": "user", "content": "Hey, how's it going?"}]
15
+
16
+ ## All your mistral deployments ##
17
+ model_list = [{
18
+ "model_name": "mistral-7b-instruct",
19
+ "litellm_params": { # params for litellm completion/embedding call
20
+ "model": "replicate/mistralai/mistral-7b-instruct-v0.1:83b6a56e7c828e667f21fd596c338fd4f0039b46bcfa18d973e8e70e455fda70",
21
+ "api_key": "replicate_api_key",
22
+ }
23
+ }, {
24
+ "model_name": "mistral-7b-instruct",
25
+ "litellm_params": { # params for litellm completion/embedding call
26
+ "model": "together_ai/mistralai/Mistral-7B-Instruct-v0.1",
27
+ "api_key": "togetherai_api_key",
28
+ }
29
+ }, {
30
+ "model_name": "mistral-7b-instruct",
31
+ "litellm_params": { # params for litellm completion/embedding call
32
+ "model": "together_ai/mistralai/Mistral-7B-Instruct-v0.1",
33
+ "api_key": "togetherai_api_key",
34
+ }
35
+ }, {
36
+ "model_name": "mistral-7b-instruct",
37
+ "litellm_params": { # params for litellm completion/embedding call
38
+ "model": "perplexity/mistral-7b-instruct",
39
+ "api_key": "perplexity_api_key"
40
+ }
41
+ }, {
42
+ "model_name": "mistral-7b-instruct",
43
+ "litellm_params": {
44
+ "model": "deepinfra/mistralai/Mistral-7B-Instruct-v0.1",
45
+ "api_key": "deepinfra_api_key"
46
+ }
47
+ }]
48
+
49
+ ## LiteLLM completion call ## returns first response
50
+ response = completion(model="mistral-7b-instruct", messages=messages, model_list=model_list)
51
+
52
+ print(response)
53
+ ```
docs/my-website/docs/completion/output.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Output
2
+
3
+ ## Format
4
+ Here's the exact json output and type you can expect from all litellm `completion` calls for all models
5
+
6
+ ```python
7
+ {
8
+ 'choices': [
9
+ {
10
+ 'finish_reason': str, # String: 'stop'
11
+ 'index': int, # Integer: 0
12
+ 'message': { # Dictionary [str, str]
13
+ 'role': str, # String: 'assistant'
14
+ 'content': str # String: "default message"
15
+ }
16
+ }
17
+ ],
18
+ 'created': str, # String: None
19
+ 'model': str, # String: None
20
+ 'usage': { # Dictionary [str, int]
21
+ 'prompt_tokens': int, # Integer
22
+ 'completion_tokens': int, # Integer
23
+ 'total_tokens': int # Integer
24
+ }
25
+ }
26
+
27
+ ```
28
+
29
+ You can access the response as a dictionary or as a class object, just as OpenAI allows you
30
+ ```python
31
+ print(response.choices[0].message.content)
32
+ print(response['choices'][0]['message']['content'])
33
+ ```
34
+
35
+ Here's what an example response looks like
36
+ ```python
37
+ {
38
+ 'choices': [
39
+ {
40
+ 'finish_reason': 'stop',
41
+ 'index': 0,
42
+ 'message': {
43
+ 'role': 'assistant',
44
+ 'content': " I'm doing well, thank you for asking. I am Claude, an AI assistant created by Anthropic."
45
+ }
46
+ }
47
+ ],
48
+ 'created': 1691429984.3852863,
49
+ 'model': 'claude-instant-1',
50
+ 'usage': {'prompt_tokens': 18, 'completion_tokens': 23, 'total_tokens': 41}
51
+ }
52
+ ```
53
+
54
+ ## Additional Attributes
55
+
56
+ You can also access information like latency.
57
+
58
+ ```python
59
+ from litellm import completion
60
+ import os
61
+ os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
62
+
63
+ messages=[{"role": "user", "content": "Hey!"}]
64
+
65
+ response = completion(model="claude-2", messages=messages)
66
+
67
+ print(response.response_ms) # 616.25# 616.25
68
+ ```
docs/my-website/docs/completion/prompt_formatting.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Prompt Formatting
2
+
3
+ LiteLLM automatically translates the OpenAI ChatCompletions prompt format, to other models. You can control this by setting a custom prompt template for a model as well.
4
+
5
+ ## Huggingface Models
6
+
7
+ LiteLLM supports [Huggingface Chat Templates](https://huggingface.co/docs/transformers/main/chat_templating), and will automatically check if your huggingface model has a registered chat template (e.g. [Mistral-7b](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json#L32)).
8
+
9
+ For popular models (e.g. meta-llama/llama2), we have their templates saved as part of the package.
10
+
11
+ **Stored Templates**
12
+
13
+ | Model Name | Works for Models | Completion Call
14
+ | -------- | -------- | -------- |
15
+ | mistralai/Mistral-7B-Instruct-v0.1 | mistralai/Mistral-7B-Instruct-v0.1| `completion(model='huggingface/mistralai/Mistral-7B-Instruct-v0.1', messages=messages, api_base="your_api_endpoint")` |
16
+ | meta-llama/Llama-2-7b-chat | All meta-llama llama2 chat models| `completion(model='huggingface/meta-llama/Llama-2-7b', messages=messages, api_base="your_api_endpoint")` |
17
+ | tiiuae/falcon-7b-instruct | All falcon instruct models | `completion(model='huggingface/tiiuae/falcon-7b-instruct', messages=messages, api_base="your_api_endpoint")` |
18
+ | mosaicml/mpt-7b-chat | All mpt chat models | `completion(model='huggingface/mosaicml/mpt-7b-chat', messages=messages, api_base="your_api_endpoint")` |
19
+ | codellama/CodeLlama-34b-Instruct-hf | All codellama instruct models | `completion(model='huggingface/codellama/CodeLlama-34b-Instruct-hf', messages=messages, api_base="your_api_endpoint")` |
20
+ | WizardLM/WizardCoder-Python-34B-V1.0 | All wizardcoder models | `completion(model='huggingface/WizardLM/WizardCoder-Python-34B-V1.0', messages=messages, api_base="your_api_endpoint")` |
21
+ | Phind/Phind-CodeLlama-34B-v2 | All phind-codellama models | `completion(model='huggingface/Phind/Phind-CodeLlama-34B-v2', messages=messages, api_base="your_api_endpoint")` |
22
+
23
+ [**Jump to code**](https://github.com/BerriAI/litellm/blob/main/litellm/llms/prompt_templates/factory.py)
24
+
25
+ ## Format Prompt Yourself
26
+
27
+ You can also format the prompt yourself. Here's how:
28
+
29
+ ```python
30
+ import litellm
31
+ # Create your own custom prompt template
32
+ litellm.register_prompt_template(
33
+ model="togethercomputer/LLaMA-2-7B-32K",
34
+ initial_prompt_value="You are a good assistant" # [OPTIONAL]
35
+ roles={
36
+ "system": {
37
+ "pre_message": "[INST] <<SYS>>\n", # [OPTIONAL]
38
+ "post_message": "\n<</SYS>>\n [/INST]\n" # [OPTIONAL]
39
+ },
40
+ "user": {
41
+ "pre_message": "[INST] ", # [OPTIONAL]
42
+ "post_message": " [/INST]" # [OPTIONAL]
43
+ },
44
+ "assistant": {
45
+ "pre_message": "\n" # [OPTIONAL]
46
+ "post_message": "\n" # [OPTIONAL]
47
+ }
48
+ }
49
+ final_prompt_value="Now answer as best you can:" # [OPTIONAL]
50
+ )
51
+
52
+ def test_huggingface_custom_model():
53
+ model = "huggingface/togethercomputer/LLaMA-2-7B-32K"
54
+ response = completion(model=model, messages=messages, api_base="https://my-huggingface-endpoint")
55
+ print(response['choices'][0]['message']['content'])
56
+ return response
57
+
58
+ test_huggingface_custom_model()
59
+ ```
60
+
61
+ This is currently supported for Huggingface, TogetherAI, Ollama, and Petals.
62
+
63
+ Other providers either have fixed prompt templates (e.g. Anthropic), or format it themselves (e.g. Replicate). If there's a provider we're missing coverage for, let us know!
64
+
65
+ ## All Providers
66
+
67
+ Here's the code for how we format all providers. Let us know how we can improve this further
68
+
69
+
70
+ | Provider | Model Name | Code |
71
+ | -------- | -------- | -------- |
72
+ | Anthropic | `claude-instant-1`, `claude-instant-1.2`, `claude-2` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/anthropic.py#L84)
73
+ | OpenAI Text Completion | `text-davinci-003`, `text-curie-001`, `text-babbage-001`, `text-ada-001`, `babbage-002`, `davinci-002`, | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/main.py#L442)
74
+ | Replicate | all model names starting with `replicate/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/replicate.py#L180)
75
+ | Cohere | `command-nightly`, `command`, `command-light`, `command-medium-beta`, `command-xlarge-beta` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/cohere.py#L115)
76
+ | Huggingface | all model names starting with `huggingface/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/huggingface_restapi.py#L186)
77
+ | OpenRouter | all model names starting with `openrouter/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/main.py#L611)
78
+ | AI21 | `j2-mid`, `j2-light`, `j2-ultra` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/ai21.py#L107)
79
+ | VertexAI | `text-bison`, `text-bison@001`, `chat-bison`, `chat-bison@001`, `chat-bison-32k`, `code-bison`, `code-bison@001`, `code-gecko@001`, `code-gecko@latest`, `codechat-bison`, `codechat-bison@001`, `codechat-bison-32k` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/vertex_ai.py#L89)
80
+ | Bedrock | all model names starting with `bedrock/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/bedrock.py#L183)
81
+ | Sagemaker | `sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/sagemaker.py#L89)
82
+ | TogetherAI | all model names starting with `together_ai/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/together_ai.py#L101)
83
+ | AlephAlpha | all model names starting with `aleph_alpha/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/aleph_alpha.py#L184)
84
+ | Palm | all model names starting with `palm/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/palm.py#L95)
85
+ | NLP Cloud | all model names starting with `palm/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/nlp_cloud.py#L120)
86
+ | Petals | all model names starting with `petals/` | [Code](https://github.com/BerriAI/litellm/blob/721564c63999a43f96ee9167d0530759d51f8d45/litellm/llms/petals.py#L87)
docs/my-website/docs/completion/reliable_completions.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reliability - Retries, Fallbacks
2
+
3
+ LiteLLM helps prevent failed requests in 2 ways:
4
+ - Retries
5
+ - Fallbacks: Context Window + General
6
+
7
+ ## Helper utils
8
+ LiteLLM supports the following functions for reliability:
9
+ * `litellm.longer_context_model_fallback_dict`: Dictionary which has a mapping for those models which have larger equivalents
10
+ * `num_retries`: use tenacity retries
11
+ * `completion()` with fallbacks: switch between models/keys/api bases in case of errors.
12
+
13
+ ## Retry failed requests
14
+
15
+ Call it in completion like this `completion(..num_retries=2)`.
16
+
17
+
18
+ Here's a quick look at how you can use it:
19
+
20
+ ```python
21
+ from litellm import completion
22
+
23
+ user_message = "Hello, whats the weather in San Francisco??"
24
+ messages = [{"content": user_message, "role": "user"}]
25
+
26
+ # normal call
27
+ response = completion(
28
+ model="gpt-3.5-turbo",
29
+ messages=messages,
30
+ num_retries=2
31
+ )
32
+ ```
33
+
34
+ ## Fallbacks
35
+
36
+ ### Context Window Fallbacks
37
+ ```python
38
+ from litellm import completion
39
+
40
+ fallback_dict = {"gpt-3.5-turbo": "gpt-3.5-turbo-16k"}
41
+ messages = [{"content": "how does a court case get to the Supreme Court?" * 500, "role": "user"}]
42
+
43
+ completion(model="gpt-3.5-turbo", messages=messages, context_window_fallback_dict=ctx_window_fallback_dict)
44
+ ```
45
+
46
+ ### Fallbacks - Switch Models/API Keys/API Bases
47
+
48
+ LLM APIs can be unstable, completion() with fallbacks ensures you'll always get a response from your calls
49
+
50
+ #### Usage
51
+ To use fallback models with `completion()`, specify a list of models in the `fallbacks` parameter.
52
+
53
+ The `fallbacks` list should include the primary model you want to use, followed by additional models that can be used as backups in case the primary model fails to provide a response.
54
+
55
+ #### switch models
56
+ ```python
57
+ response = completion(model="bad-model", messages=messages,
58
+ fallbacks=["gpt-3.5-turbo" "command-nightly"])
59
+ ```
60
+
61
+ #### switch api keys/bases (E.g. azure deployment)
62
+ Switch between different keys for the same azure deployment, or use another deployment as well.
63
+
64
+ ```python
65
+ api_key="bad-key"
66
+ response = completion(model="azure/gpt-4", messages=messages, api_key=api_key,
67
+ fallbacks=[{"api_key": "good-key-1"}, {"api_key": "good-key-2", "api_base": "good-api-base-2"}])
68
+ ```
69
+
70
+ [Check out this section for implementation details](#fallbacks-1)
71
+
72
+ ## Implementation Details
73
+
74
+ ### Fallbacks
75
+ #### Output from calls
76
+ ```
77
+ Completion with 'bad-model': got exception Unable to map your input to a model. Check your input - {'model': 'bad-model'
78
+
79
+
80
+
81
+ completion call gpt-3.5-turbo
82
+ {
83
+ "id": "chatcmpl-7qTmVRuO3m3gIBg4aTmAumV1TmQhB",
84
+ "object": "chat.completion",
85
+ "created": 1692741891,
86
+ "model": "gpt-3.5-turbo-0613",
87
+ "choices": [
88
+ {
89
+ "index": 0,
90
+ "message": {
91
+ "role": "assistant",
92
+ "content": "I apologize, but as an AI, I do not have the capability to provide real-time weather updates. However, you can easily check the current weather in San Francisco by using a search engine or checking a weather website or app."
93
+ },
94
+ "finish_reason": "stop"
95
+ }
96
+ ],
97
+ "usage": {
98
+ "prompt_tokens": 16,
99
+ "completion_tokens": 46,
100
+ "total_tokens": 62
101
+ }
102
+ }
103
+
104
+ ```
105
+
106
+ #### How does fallbacks work
107
+
108
+ When you pass `fallbacks` to `completion`, it makes the first `completion` call using the primary model specified as `model` in `completion(model=model)`. If the primary model fails or encounters an error, it automatically tries the `fallbacks` models in the specified order. This ensures a response even if the primary model is unavailable.
109
+
110
+
111
+ #### Key components of Model Fallbacks implementation:
112
+ * Looping through `fallbacks`
113
+ * Cool-Downs for rate-limited models
114
+
115
+ #### Looping through `fallbacks`
116
+ Allow `45seconds` for each request. In the 45s this function tries calling the primary model set as `model`. If model fails it loops through the backup `fallbacks` models and attempts to get a response in the allocated `45s` time set here:
117
+ ```python
118
+ while response == None and time.time() - start_time < 45:
119
+ for model in fallbacks:
120
+ ```
121
+
122
+ #### Cool-Downs for rate-limited models
123
+ If a model API call leads to an error - allow it to cooldown for `60s`
124
+ ```python
125
+ except Exception as e:
126
+ print(f"got exception {e} for model {model}")
127
+ rate_limited_models.add(model)
128
+ model_expiration_times[model] = (
129
+ time.time() + 60
130
+ ) # cool down this selected model
131
+ pass
132
+ ```
133
+
134
+ Before making an LLM API call we check if the selected model is in `rate_limited_models`, if so skip making the API call
135
+ ```python
136
+ if (
137
+ model in rate_limited_models
138
+ ): # check if model is currently cooling down
139
+ if (
140
+ model_expiration_times.get(model)
141
+ and time.time() >= model_expiration_times[model]
142
+ ):
143
+ rate_limited_models.remove(
144
+ model
145
+ ) # check if it's been 60s of cool down and remove model
146
+ else:
147
+ continue # skip model
148
+
149
+ ```
150
+
151
+ #### Full code of completion with fallbacks()
152
+ ```python
153
+
154
+ response = None
155
+ rate_limited_models = set()
156
+ model_expiration_times = {}
157
+ start_time = time.time()
158
+ fallbacks = [kwargs["model"]] + kwargs["fallbacks"]
159
+ del kwargs["fallbacks"] # remove fallbacks so it's not recursive
160
+
161
+ while response == None and time.time() - start_time < 45:
162
+ for model in fallbacks:
163
+ # loop thru all models
164
+ try:
165
+ if (
166
+ model in rate_limited_models
167
+ ): # check if model is currently cooling down
168
+ if (
169
+ model_expiration_times.get(model)
170
+ and time.time() >= model_expiration_times[model]
171
+ ):
172
+ rate_limited_models.remove(
173
+ model
174
+ ) # check if it's been 60s of cool down and remove model
175
+ else:
176
+ continue # skip model
177
+
178
+ # delete model from kwargs if it exists
179
+ if kwargs.get("model"):
180
+ del kwargs["model"]
181
+
182
+ print("making completion call", model)
183
+ response = litellm.completion(**kwargs, model=model)
184
+
185
+ if response != None:
186
+ return response
187
+
188
+ except Exception as e:
189
+ print(f"got exception {e} for model {model}")
190
+ rate_limited_models.add(model)
191
+ model_expiration_times[model] = (
192
+ time.time() + 60
193
+ ) # cool down this selected model
194
+ pass
195
+ return response
196
+ ```
docs/my-website/docs/completion/stream.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Streaming + Async
2
+
3
+ - [Streaming Responses](#streaming-responses)
4
+ - [Async Completion](#async-completion)
5
+ - [Async + Streaming Completion](#async-streaming)
6
+
7
+ ## Streaming Responses
8
+ LiteLLM supports streaming the model response back by passing `stream=True` as an argument to the completion function
9
+ ### Usage
10
+ ```python
11
+ from litellm import completion
12
+ messages = [{"role": "user", "content": "Hey, how's it going?"}]
13
+ response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
14
+ for part in response:
15
+ print(part.choices[0].delta.content or "")
16
+ ```
17
+
18
+ ### Helper function
19
+
20
+ LiteLLM also exposes a helper function to rebuild the complete streaming response from the list of chunks.
21
+
22
+ ```python
23
+ from litellm import completion
24
+ messages = [{"role": "user", "content": "Hey, how's it going?"}]
25
+ response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
26
+
27
+ for chunk in response:
28
+ chunks.append(chunk)
29
+
30
+ print(litellm.stream_chunk_builder(chunks, messages=messages))
31
+ ```
32
+
33
+ ## Async Completion
34
+ Asynchronous Completion with LiteLLM. LiteLLM provides an asynchronous version of the completion function called `acompletion`
35
+ ### Usage
36
+ ```python
37
+ from litellm import acompletion
38
+ import asyncio
39
+
40
+ async def test_get_response():
41
+ user_message = "Hello, how are you?"
42
+ messages = [{"content": user_message, "role": "user"}]
43
+ response = await acompletion(model="gpt-3.5-turbo", messages=messages)
44
+ return response
45
+
46
+ response = asyncio.run(test_get_response())
47
+ print(response)
48
+
49
+ ```
50
+
51
+ ## Async Streaming
52
+ We've implemented an `__anext__()` function in the streaming object returned. This enables async iteration over the streaming object.
53
+
54
+ ### Usage
55
+ Here's an example of using it with openai.
56
+ ```python
57
+ from litellm import acompletion
58
+ import asyncio, os, traceback
59
+
60
+ async def completion_call():
61
+ try:
62
+ print("test acompletion + streaming")
63
+ response = await acompletion(
64
+ model="gpt-3.5-turbo",
65
+ messages=[{"content": "Hello, how are you?", "role": "user"}],
66
+ stream=True
67
+ )
68
+ print(f"response: {response}")
69
+ async for chunk in response:
70
+ print(chunk)
71
+ except:
72
+ print(f"error occurred: {traceback.format_exc()}")
73
+ pass
74
+
75
+ asyncio.run(completion_call())
76
+ ```
docs/my-website/docs/completion/token_usage.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Completion Token Usage & Cost
2
+ By default LiteLLM returns token usage in all completion requests ([See here](https://litellm.readthedocs.io/en/latest/output/))
3
+
4
+ However, we also expose 5 helper functions + **[NEW]** an API to calculate token usage across providers:
5
+
6
+ - `encode`: This encodes the text passed in, using the model-specific tokenizer. [**Jump to code**](#1-encode)
7
+
8
+ - `decode`: This decodes the tokens passed in, using the model-specific tokenizer. [**Jump to code**](#2-decode)
9
+
10
+ - `token_counter`: This returns the number of tokens for a given input - it uses the tokenizer based on the model, and defaults to tiktoken if no model-specific tokenizer is available. [**Jump to code**](#3-token_counter)
11
+
12
+ - `cost_per_token`: This returns the cost (in USD) for prompt (input) and completion (output) tokens. Uses the live list from `api.litellm.ai`. [**Jump to code**](#4-cost_per_token)
13
+
14
+ - `completion_cost`: This returns the overall cost (in USD) for a given LLM API Call. It combines `token_counter` and `cost_per_token` to return the cost for that query (counting both cost of input and output). [**Jump to code**](#5-completion_cost)
15
+
16
+ - `get_max_tokens`: This returns the maximum number of tokens allowed for the given model. [**Jump to code**](#6-get_max_tokens)
17
+
18
+ - `model_cost`: This returns a dictionary for all models, with their max_tokens, input_cost_per_token and output_cost_per_token. It uses the `api.litellm.ai` call shown below. [**Jump to code**](#7-model_cost)
19
+
20
+ - `register_model`: This registers new / overrides existing models (and their pricing details) in the model cost dictionary. [**Jump to code**](#8-register_model)
21
+
22
+ - `api.litellm.ai`: Live token + price count across [all supported models](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). [**Jump to code**](#9-apilitellmai)
23
+
24
+ πŸ“£ This is a community maintained list. Contributions are welcome! ❀️
25
+
26
+ ## Example Usage
27
+
28
+ ### 1. `encode`
29
+ Encoding has model-specific tokenizers for anthropic, cohere, llama2 and openai. If an unsupported model is passed in, it'll default to using tiktoken (openai's tokenizer).
30
+
31
+ ```python
32
+ from litellm import encode, decode
33
+
34
+ sample_text = "HellΓΆ World, this is my input string!"
35
+ # openai encoding + decoding
36
+ openai_tokens = encode(model="gpt-3.5-turbo", text=sample_text)
37
+ print(openai_tokens)
38
+ ```
39
+
40
+ ### 2. `decode`
41
+
42
+ Decoding is supported for anthropic, cohere, llama2 and openai.
43
+
44
+ ```python
45
+ from litellm import encode, decode
46
+
47
+ sample_text = "HellΓΆ World, this is my input string!"
48
+ # openai encoding + decoding
49
+ openai_tokens = encode(model="gpt-3.5-turbo", text=sample_text)
50
+ openai_text = decode(model="gpt-3.5-turbo", tokens=openai_tokens)
51
+ print(openai_text)
52
+ ```
53
+
54
+ ### 3. `token_counter`
55
+
56
+ ```python
57
+ from litellm import token_counter
58
+
59
+ messages = [{"user": "role", "content": "Hey, how's it going"}]
60
+ print(token_counter(model="gpt-3.5-turbo", messages=messages))
61
+ ```
62
+
63
+ ### 4. `cost_per_token`
64
+
65
+ ```python
66
+ from litellm import cost_per_token
67
+
68
+ prompt_tokens = 5
69
+ completion_tokens = 10
70
+ prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_token(model="gpt-3.5-turbo", prompt_tokens=prompt_tokens, completion_tokens=completion_tokens))
71
+
72
+ print(prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar)
73
+ ```
74
+
75
+ ### 5. `completion_cost`
76
+
77
+ * Input: Accepts a `litellm.completion()` response **OR** prompt + completion strings
78
+ * Output: Returns a `float` of cost for the `completion` call
79
+
80
+ **litellm.completion()**
81
+ ```python
82
+ from litellm import completion, completion_cost
83
+
84
+ response = completion(
85
+ model="bedrock/anthropic.claude-v2",
86
+ messages=messages,
87
+ request_timeout=200,
88
+ )
89
+ # pass your response from completion to completion_cost
90
+ cost = completion_cost(completion_response=response)
91
+ formatted_string = f"${float(cost):.10f}"
92
+ print(formatted_string)
93
+ ```
94
+
95
+ **prompt + completion string**
96
+ ```python
97
+ from litellm import completion_cost
98
+ cost = completion_cost(model="bedrock/anthropic.claude-v2", prompt="Hey!", completion="How's it going?")
99
+ formatted_string = f"${float(cost):.10f}"
100
+ print(formatted_string)
101
+ ```
102
+ ### 6. `get_max_tokens`
103
+
104
+ Input: Accepts a model name - e.g., gpt-3.5-turbo (to get a complete list, call litellm.model_list).
105
+ Output: Returns the maximum number of tokens allowed for the given model
106
+
107
+ ```python
108
+ from litellm import get_max_tokens
109
+
110
+ model = "gpt-3.5-turbo"
111
+
112
+ print(get_max_tokens(model)) # Output: 4097
113
+ ```
114
+
115
+ ### 7. `model_cost`
116
+
117
+ * Output: Returns a dict object containing the max_tokens, input_cost_per_token, output_cost_per_token for all models on [community-maintained list](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
118
+
119
+ ```python
120
+ from litellm import model_cost
121
+
122
+ print(model_cost) # {'gpt-3.5-turbo': {'max_tokens': 4000, 'input_cost_per_token': 1.5e-06, 'output_cost_per_token': 2e-06}, ...}
123
+ ```
124
+
125
+ ### 8. `register_model`
126
+
127
+ * Input: Provide EITHER a model cost dictionary or a url to a hosted json blob
128
+ * Output: Returns updated model_cost dictionary + updates litellm.model_cost with model details.
129
+
130
+ **Dictionary**
131
+ ```python
132
+ from litellm import register_model
133
+
134
+ litellm.register_model({
135
+ "gpt-4": {
136
+ "max_tokens": 8192,
137
+ "input_cost_per_token": 0.00002,
138
+ "output_cost_per_token": 0.00006,
139
+ "litellm_provider": "openai",
140
+ "mode": "chat"
141
+ },
142
+ })
143
+ ```
144
+
145
+ **URL for json blob**
146
+ ```python
147
+ import litellm
148
+
149
+ litellm.register_model(model_cost=
150
+ "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json")
151
+ ```
152
+
153
+
154
+
docs/my-website/docs/contact.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # Contact Us
2
+
3
+ [![](https://dcbadge.vercel.app/api/server/wuPM9dRgDw)](https://discord.gg/wuPM9dRgDw)
4
+
5
+ * [Meet with us πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
6
docs/my-website/docs/debugging/hosted_debugging.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Image from '@theme/IdealImage';
2
+ import QueryParamReader from '../../src/components/queryParamReader.js'
3
+
4
+ # [Beta] Monitor Logs in Production
5
+
6
+ :::note
7
+
8
+ This is in beta. Expect frequent updates, as we improve based on your feedback.
9
+
10
+ :::
11
+
12
+ LiteLLM provides an integration to let you monitor logs in production.
13
+
14
+ πŸ‘‰ Jump to our sample LiteLLM Dashboard: https://admin.litellm.ai/
15
+
16
+
17
+ <Image img={require('../../img/alt_dashboard.png')} alt="Dashboard" />
18
+
19
+ ## Debug your first logs
20
+ <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_OpenAI.ipynb">
21
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
22
+ </a>
23
+
24
+
25
+ ### 1. Get your LiteLLM Token
26
+
27
+ Go to [admin.litellm.ai](https://admin.litellm.ai/) and copy the code snippet with your unique token
28
+
29
+ <Image img={require('../../img/hosted_debugger_usage_page.png')} alt="Usage" />
30
+
31
+ ### 2. Set up your environment
32
+
33
+ **Add it to your .env**
34
+
35
+ ```python
36
+ import os
37
+
38
+ os.env["LITELLM_TOKEN"] = "e24c4c06-d027-4c30-9e78-18bc3a50aebb" # replace with your unique token
39
+
40
+ ```
41
+
42
+ **Turn on LiteLLM Client**
43
+ ```python
44
+ import litellm
45
+ litellm.client = True
46
+ ```
47
+
48
+ ### 3. Make a normal `completion()` call
49
+ ```python
50
+ import litellm
51
+ from litellm import completion
52
+ import os
53
+
54
+ # set env variables
55
+ os.environ["LITELLM_TOKEN"] = "e24c4c06-d027-4c30-9e78-18bc3a50aebb" # replace with your unique token
56
+ os.environ["OPENAI_API_KEY"] = "openai key"
57
+
58
+ litellm.use_client = True # enable logging dashboard
59
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
60
+
61
+ # openai call
62
+ response = completion(model="gpt-3.5-turbo", messages=messages)
63
+ ```
64
+
65
+ Your `completion()` call print with a link to your session dashboard (https://admin.litellm.ai/<your_unique_token>)
66
+
67
+ In the above case it would be: [`admin.litellm.ai/e24c4c06-d027-4c30-9e78-18bc3a50aebb`](https://admin.litellm.ai/e24c4c06-d027-4c30-9e78-18bc3a50aebb)
68
+
69
+ Click on your personal dashboard link. Here's how you can find it πŸ‘‡
70
+
71
+ <Image img={require('../../img/dash_output.png')} alt="Dashboard" />
72
+
73
+ [πŸ‘‹ Tell us if you need better privacy controls](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version?month=2023-08)
74
+
75
+ ### 3. Review request log
76
+
77
+ Oh! Looks like our request was made successfully. Let's click on it and see exactly what got sent to the LLM provider.
78
+
79
+ <Image img={require('../../img/dashboard_log_row.png')} alt="Dashboard Log Row" />
80
+
81
+
82
+
83
+ Ah! So we can see that this request was made to a **Baseten** (see litellm_params > custom_llm_provider) for a model with ID - **7qQNLDB** (see model). The message sent was - `"Hey, how's it going?"` and the response received was - `"As an AI language model, I don't have feelings or emotions, but I can assist you with your queries. How can I assist you today?"`
84
+
85
+ <Image img={require('../../img/dashboard_log.png')} alt="Dashboard Log Row" />
86
+
87
+ :::info
88
+
89
+ πŸŽ‰ Congratulations! You've successfully debugger your first log!
90
+
91
+ :::
docs/my-website/docs/debugging/local_debugging.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Local Debugging
2
+ There's 2 ways to do local debugging - `litellm.set_verbose=True` and by passing in a custom function `completion(...logger_fn=<your_local_function>)`. Warning: Make sure to not use `set_verbose` in production. It logs API keys, which might end up in log files.
3
+
4
+ ## Set Verbose
5
+
6
+ This is good for getting print statements for everything litellm is doing.
7
+ ```python
8
+ import litellm
9
+ from litellm import completion
10
+
11
+ litellm.set_verbose=True # πŸ‘ˆ this is the 1-line change you need to make
12
+
13
+ ## set ENV variables
14
+ os.environ["OPENAI_API_KEY"] = "openai key"
15
+ os.environ["COHERE_API_KEY"] = "cohere key"
16
+
17
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
18
+
19
+ # openai call
20
+ response = completion(model="gpt-3.5-turbo", messages=messages)
21
+
22
+ # cohere call
23
+ response = completion("command-nightly", messages)
24
+ ```
25
+
26
+ ## Logger Function
27
+ But sometimes all you care about is seeing exactly what's getting sent to your api call and what's being returned - e.g. if the api call is failing, why is that happening? what are the exact params being set?
28
+
29
+ In that case, LiteLLM allows you to pass in a custom logging function to see / modify the model call Input/Outputs.
30
+
31
+ **Note**: We expect you to accept a dict object.
32
+
33
+ Your custom function
34
+
35
+ ```python
36
+ def my_custom_logging_fn(model_call_dict):
37
+ print(f"model call details: {model_call_dict}")
38
+ ```
39
+
40
+ ### Complete Example
41
+ ```python
42
+ from litellm import completion
43
+
44
+ def my_custom_logging_fn(model_call_dict):
45
+ print(f"model call details: {model_call_dict}")
46
+
47
+ ## set ENV variables
48
+ os.environ["OPENAI_API_KEY"] = "openai key"
49
+ os.environ["COHERE_API_KEY"] = "cohere key"
50
+
51
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
52
+
53
+ # openai call
54
+ response = completion(model="gpt-3.5-turbo", messages=messages, logger_fn=my_custom_logging_fn)
55
+
56
+ # cohere call
57
+ response = completion("command-nightly", messages, logger_fn=my_custom_logging_fn)
58
+ ```
59
+
60
+ ## Still Seeing Issues?
61
+
62
+ Text us @ +17708783106 or Join the [Discord](https://discord.com/invite/wuPM9dRgDw).
63
+
64
+ We promise to help you in `lite`ning speed ❀️
docs/my-website/docs/default_code_snippet.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ displayed_sidebar: tutorialSidebar
3
+ ---
4
+ # Get Started
5
+
6
+ import QueryParamReader from '../src/components/queryParamReader.js'
7
+ import TokenComponent from '../src/components/queryParamToken.js'
8
+
9
+ :::info
10
+
11
+ This section assumes you've already added your API keys in <TokenComponent/>
12
+
13
+ If you want to use the non-hosted version, [go here](https://docs.litellm.ai/docs/#quick-start)
14
+
15
+ :::
16
+
17
+
18
+ ```
19
+ pip install litellm
20
+ ```
21
+
22
+ <QueryParamReader/>
docs/my-website/docs/embedding/async_embedding.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Async Embedding
2
+
3
+ LiteLLM provides an asynchronous version of the `embedding` function called `aembedding`
4
+ ### Usage
5
+ ```python
6
+ from litellm import aembedding
7
+ import asyncio
8
+
9
+ async def test_get_response():
10
+ response = await aembedding('text-embedding-ada-002', input=["good morning from litellm"])
11
+ return response
12
+
13
+ response = asyncio.run(test_get_response())
14
+ print(response)
15
+ ```
docs/my-website/docs/embedding/moderation.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Moderation
2
+ LiteLLM supports the moderation endpoint for OpenAI
3
+
4
+ ## Usage
5
+ ```python
6
+ import os
7
+ from litellm import moderation
8
+ os.environ['OPENAI_API_KEY'] = ""
9
+ response = moderation(input="i'm ishaan cto of litellm")
10
+ ```
docs/my-website/docs/embedding/supported_embedding.md ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Embedding Models
2
+
3
+ ## Quick Start
4
+ ```python
5
+ from litellm import embedding
6
+ import os
7
+ os.environ['OPENAI_API_KEY'] = ""
8
+ response = embedding(model='text-embedding-ada-002', input=["good morning from litellm"])
9
+ ```
10
+
11
+ ### Input Params for `litellm.embedding()`
12
+ ### Required Fields
13
+
14
+ - `model`: *string* - ID of the model to use. `model='text-embedding-ada-002'`
15
+
16
+ - `input`: *array* - Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays. The input must not exceed the max input tokens for the model (8192 tokens for text-embedding-ada-002), cannot be an empty string, and any array must be 2048 dimensions or less.
17
+ ```
18
+ input=["good morning from litellm"]
19
+ ```
20
+
21
+ ### Optional LiteLLM Fields
22
+
23
+ - `user`: *string (optional)* A unique identifier representing your end-user,
24
+
25
+ - `timeout`: *integer* - The maximum time, in seconds, to wait for the API to respond. Defaults to 600 seconds (10 minutes).
26
+
27
+ - `api_base`: *string (optional)* - The api endpoint you want to call the model with
28
+
29
+ - `api_version`: *string (optional)* - (Azure-specific) the api version for the call
30
+
31
+ - `api_key`: *string (optional)* - The API key to authenticate and authorize requests. If not provided, the default API key is used.
32
+
33
+ - `api_type`: *string (optional)* - The type of API to use.
34
+
35
+ ### Output from `litellm.embedding()`
36
+
37
+ ```json
38
+ {
39
+ "object": "list",
40
+ "data": [
41
+ {
42
+ "object": "embedding",
43
+ "index": 0,
44
+ "embedding": [
45
+ -0.0022326677571982145,
46
+ 0.010749882087111473,
47
+ ...
48
+ ...
49
+ ...
50
+
51
+ ]
52
+ }
53
+ ],
54
+ "model": "text-embedding-ada-002-v2",
55
+ "usage": {
56
+ "prompt_tokens": 10,
57
+ "total_tokens": 10
58
+ }
59
+ }
60
+ ```
61
+
62
+ ## OpenAI Embedding Models
63
+
64
+ ### Usage
65
+ ```python
66
+ from litellm import embedding
67
+ import os
68
+ os.environ['OPENAI_API_KEY'] = ""
69
+ response = embedding('text-embedding-ada-002', input=["good morning from litellm"])
70
+ ```
71
+
72
+ | Model Name | Function Call | Required OS Variables |
73
+ |----------------------|---------------------------------------------|--------------------------------------|
74
+ | text-embedding-ada-002 | `embedding('text-embedding-ada-002', input)` | `os.environ['OPENAI_API_KEY']` |
75
+
76
+ ## Azure OpenAI Embedding Models
77
+
78
+ ### API keys
79
+ This can be set as env variables or passed as **params to litellm.embedding()**
80
+ ```python
81
+ import os
82
+ os.environ['AZURE_API_KEY'] =
83
+ os.environ['AZURE_API_BASE'] =
84
+ os.environ['AZURE_API_VERSION'] =
85
+ ```
86
+
87
+ ### Usage
88
+ ```python
89
+ from litellm import embedding
90
+ response = embedding(
91
+ model="azure/<your deployment name>",
92
+ input=["good morning from litellm"],
93
+ api_key=api_key,
94
+ api_base=api_base,
95
+ api_version=api_version,
96
+ )
97
+ print(response)
98
+ ```
99
+
100
+ | Model Name | Function Call |
101
+ |----------------------|---------------------------------------------|
102
+ | text-embedding-ada-002 | `embedding(model="azure/<your deployment name>", input=input)` |
103
+
104
+ h/t to [Mikko](https://www.linkedin.com/in/mikkolehtimaki/) for this integration
105
+
106
+ ## OpenAI Compatible Embedding Models
107
+ Use this for calling `/embedding` endpoints on OpenAI Compatible Servers, example https://github.com/xorbitsai/inference
108
+
109
+ **Note add `openai/` prefix to model so litellm knows to route to OpenAI**
110
+
111
+ ### Usage
112
+ ```python
113
+ from litellm import embedding
114
+ response = embedding(
115
+ model = "openai/<your-llm-name>", # add `openai/` prefix to model so litellm knows to route to OpenAI
116
+ api_base="http://0.0.0.0:8000/" # set API Base of your Custom OpenAI Endpoint
117
+ input=["good morning from litellm"]
118
+ )
119
+ ```
120
+
121
+ ## Bedrock Embedding
122
+
123
+ ### API keys
124
+ This can be set as env variables or passed as **params to litellm.embedding()**
125
+ ```python
126
+ import os
127
+ os.environ["AWS_ACCESS_KEY_ID"] = "" # Access key
128
+ os.environ["AWS_SECRET_ACCESS_KEY"] = "" # Secret access key
129
+ os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
130
+ ```
131
+
132
+ ### Usage
133
+ ```python
134
+ from litellm import embedding
135
+ response = embedding(
136
+ model="amazon.titan-embed-text-v1",
137
+ input=["good morning from litellm"],
138
+ )
139
+ print(response)
140
+ ```
141
+
142
+ | Model Name | Function Call |
143
+ |----------------------|---------------------------------------------|
144
+ | Titan Embeddings - G1 | `embedding(model="amazon.titan-embed-text-v1", input=input)` |
145
+
146
+
147
+ ## Cohere Embedding Models
148
+ https://docs.cohere.com/reference/embed
149
+
150
+ ### Usage
151
+ ```python
152
+ from litellm import embedding
153
+ os.environ["COHERE_API_KEY"] = "cohere key"
154
+
155
+ # cohere call
156
+ response = embedding(
157
+ model="embed-english-v3.0",
158
+ input=["good morning from litellm", "this is another item"],
159
+ input_type="search_document" # optional param for v3 llms
160
+ )
161
+ ```
162
+ | Model Name | Function Call |
163
+ |--------------------------|--------------------------------------------------------------|
164
+ | embed-english-v3.0 | `embedding(model="embed-english-v3.0", input=["good morning from litellm", "this is another item"])` |
165
+ | embed-english-light-v3.0 | `embedding(model="embed-english-light-v3.0", input=["good morning from litellm", "this is another item"])` |
166
+ | embed-multilingual-v3.0 | `embedding(model="embed-multilingual-v3.0", input=["good morning from litellm", "this is another item"])` |
167
+ | embed-multilingual-light-v3.0 | `embedding(model="embed-multilingual-light-v3.0", input=["good morning from litellm", "this is another item"])` |
168
+ | embed-english-v2.0 | `embedding(model="embed-english-v2.0", input=["good morning from litellm", "this is another item"])` |
169
+ | embed-english-light-v2.0 | `embedding(model="embed-english-light-v2.0", input=["good morning from litellm", "this is another item"])` |
170
+ | embed-multilingual-v2.0 | `embedding(model="embed-multilingual-v2.0", input=["good morning from litellm", "this is another item"])` |
171
+
172
+ ## HuggingFace Embedding Models
173
+ LiteLLM supports all Feature-Extraction Embedding models: https://huggingface.co/models?pipeline_tag=feature-extraction
174
+
175
+ ### Usage
176
+ ```python
177
+ from litellm import embedding
178
+ import os
179
+ os.environ['HUGGINGFACE_API_KEY'] = ""
180
+ response = embedding(
181
+ model='huggingface/microsoft/codebert-base',
182
+ input=["good morning from litellm"]
183
+ )
184
+ ```
185
+ ### Usage - Custom API Base
186
+ ```python
187
+ from litellm import embedding
188
+ import os
189
+ os.environ['HUGGINGFACE_API_KEY'] = ""
190
+ response = embedding(
191
+ model='huggingface/microsoft/codebert-base',
192
+ input=["good morning from litellm"],
193
+ api_base = "https://p69xlsj6rpno5drq.us-east-1.aws.endpoints.huggingface.cloud"
194
+ )
195
+ ```
196
+
197
+ | Model Name | Function Call | Required OS Variables |
198
+ |-----------------------|--------------------------------------------------------------|-------------------------------------------------|
199
+ | microsoft/codebert-base | `embedding('huggingface/microsoft/codebert-base', input=input)` | `os.environ['HUGGINGFACE_API_KEY']` |
200
+ | BAAI/bge-large-zh | `embedding('huggingface/BAAI/bge-large-zh', input=input)` | `os.environ['HUGGINGFACE_API_KEY']` |
201
+ | any-hf-embedding-model | `embedding('huggingface/hf-embedding-model', input=input)` | `os.environ['HUGGINGFACE_API_KEY']` |
docs/my-website/docs/exception_mapping.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Exception Mapping
2
+
3
+ LiteLLM maps exceptions across all providers to their OpenAI counterparts.
4
+ - Rate Limit Errors
5
+ - Invalid Request Errors
6
+ - Authentication Errors
7
+ - Timeout Errors `openai.APITimeoutError`
8
+ - ServiceUnavailableError
9
+ - APIError
10
+ - APIConnectionError
11
+
12
+ Base case we return APIConnectionError
13
+
14
+ All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
15
+
16
+ For all cases, the exception returned inherits from the original OpenAI Exception but contains 3 additional attributes:
17
+ * status_code - the http status code of the exception
18
+ * message - the error message
19
+ * llm_provider - the provider raising the exception
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ import litellm
25
+ import openai
26
+
27
+ try:
28
+ response = litellm.completion(
29
+ model="gpt-4",
30
+ messages=[
31
+ {
32
+ "role": "user",
33
+ "content": "hello, write a 20 pageg essay"
34
+ }
35
+ ],
36
+ timeout=0.01, # this will raise a timeout exception
37
+ )
38
+ except openai.APITimeoutError as e:
39
+ print("Passed: Raised correct exception. Got openai.APITimeoutError\nGood Job", e)
40
+ print(type(e))
41
+ pass
42
+ ```
43
+
44
+ ## Usage - Catching Streaming Exceptions
45
+ ```python
46
+ import litellm
47
+ try:
48
+ response = litellm.completion(
49
+ model="gpt-3.5-turbo",
50
+ messages=[
51
+ {
52
+ "role": "user",
53
+ "content": "hello, write a 20 pg essay"
54
+ }
55
+ ],
56
+ timeout=0.0001, # this will raise an exception
57
+ stream=True,
58
+ )
59
+ for chunk in response:
60
+ print(chunk)
61
+ except openai.APITimeoutError as e:
62
+ print("Passed: Raised correct exception. Got openai.APITimeoutError\nGood Job", e)
63
+ print(type(e))
64
+ pass
65
+ except Exception as e:
66
+ print(f"Did not raise error `openai.APITimeoutError`. Instead raised error type: {type(e)}, Error: {e}")
67
+
68
+ ```
69
+
70
+ ## Details
71
+
72
+ To see how it's implemented - [check out the code](https://github.com/BerriAI/litellm/blob/a42c197e5a6de56ea576c73715e6c7c6b19fa249/litellm/utils.py#L1217)
73
+
74
+ [Create an issue](https://github.com/BerriAI/litellm/issues/new) **or** [make a PR](https://github.com/BerriAI/litellm/pulls) if you want to improve the exception mapping.
75
+
76
+ **Note** For OpenAI and Azure we return the original exception (since they're of the OpenAI Error type). But we add the 'llm_provider' attribute to them. [See code](https://github.com/BerriAI/litellm/blob/a42c197e5a6de56ea576c73715e6c7c6b19fa249/litellm/utils.py#L1221)
77
+
78
+ ## Custom mapping list
79
+
80
+ Base case - we return the original exception.
81
+
82
+ | | ContextWindowExceededError | AuthenticationError | InvalidRequestError | RateLimitError | ServiceUnavailableError |
83
+ |---------------|----------------------------|---------------------|---------------------|---------------|-------------------------|
84
+ | Anthropic | βœ… | βœ… | βœ… | βœ… | |
85
+ | OpenAI | βœ… | βœ… |βœ… |βœ… |βœ…|
86
+ | Replicate | βœ… | βœ… | βœ… | βœ… | βœ… |
87
+ | Cohere | βœ… | βœ… | βœ… | βœ… | βœ… |
88
+ | Huggingface | βœ… | βœ… | βœ… | βœ… | |
89
+ | Openrouter | βœ… | βœ… | βœ… | βœ… | |
90
+ | AI21 | βœ… | βœ… | βœ… | βœ… | |
91
+ | VertexAI | | |βœ… | | |
92
+ | Bedrock | | |βœ… | | |
93
+ | Sagemaker | | |βœ… | | |
94
+ | TogetherAI | βœ… | βœ… | βœ… | βœ… | |
95
+ | AlephAlpha | βœ… | βœ… | βœ… | βœ… | βœ… |
96
+
97
+
98
+ > For a deeper understanding of these exceptions, you can check out [this](https://github.com/BerriAI/litellm/blob/d7e58d13bf9ba9edbab2ab2f096f3de7547f35fa/litellm/utils.py#L1544) implementation for additional insights.
99
+
100
+ The `ContextWindowExceededError` is a sub-class of `InvalidRequestError`. It was introduced to provide more granularity for exception-handling scenarios. Please refer to [this issue to learn more](https://github.com/BerriAI/litellm/issues/228).
101
+
102
+ Contributions to improve exception mapping are [welcome](https://github.com/BerriAI/litellm#contributing)
docs/my-website/docs/extras/contributing.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Contributing to Documentation
2
+
3
+ This website is built using [Docusaurus 2](https://docusaurus.io/), a modern static website generator.
4
+
5
+ Clone litellm
6
+ ```
7
+ git clone https://github.com/BerriAI/litellm.git
8
+ ```
9
+
10
+ ### Local setup for locally running docs
11
+
12
+ #### Installation
13
+ ```
14
+ npm install --global yarn
15
+ ```
16
+
17
+
18
+ ### Local Development
19
+
20
+ ```
21
+ cd docs/my-website
22
+ ```
23
+
24
+ Let's Install requirement
25
+
26
+ ```
27
+ yarn
28
+ ```
29
+ Run website
30
+
31
+ ```
32
+ yarn start
33
+ ```
34
+ Open docs here: [http://localhost:3000/](http://localhost:3000/)
35
+
36
+ ```
37
+
38
+ This command builds your Markdown files into HTML and starts a development server to browse your documentation. Open up [http://127.0.0.1:8000/](http://127.0.0.1:8000/) in your web browser to see your documentation. You can make changes to your Markdown files and your docs will automatically rebuild.
39
+
40
+ [Full tutorial here](https://docs.readthedocs.io/en/stable/intro/getting-started-with-mkdocs.html)
41
+
42
+ ### Making changes to Docs
43
+ - All the docs are placed under the `docs` directory
44
+ - If you are adding a new `.md` file or editing the hierarchy edit `mkdocs.yml` in the root of the project
45
+ - After testing your changes, make a change to the `main` branch of [github.com/BerriAI/litellm](https://github.com/BerriAI/litellm)
46
+
47
+
48
+
49
+
docs/my-website/docs/getting_started.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Getting Started
2
+
3
+ import QuickStart from '../src/components/QuickStart.js'
4
+
5
+ LiteLLM simplifies LLM API calls by mapping them all to the [OpenAI ChatCompletion format](https://platform.openai.com/docs/api-reference/chat).
6
+
7
+ ## basic usage
8
+
9
+ By default we provide a free $10 community-key to try all providers supported on LiteLLM.
10
+
11
+ ```python
12
+ from litellm import completion
13
+
14
+ ## set ENV variables
15
+ os.environ["OPENAI_API_KEY"] = "your-api-key"
16
+ os.environ["COHERE_API_KEY"] = "your-api-key"
17
+
18
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
19
+
20
+ # openai call
21
+ response = completion(model="gpt-3.5-turbo", messages=messages)
22
+
23
+ # cohere call
24
+ response = completion("command-nightly", messages)
25
+ ```
26
+
27
+ **Need a dedicated key?**
28
+ Email us @ [email protected]
29
+
30
+ Next Steps πŸ‘‰ [Call all supported models - e.g. Claude-2, Llama2-70b, etc.](./proxy_api.md#supported-models)
31
+
32
+ More details πŸ‘‰
33
+ * [Completion() function details](./completion/)
34
+ * [All supported models / providers on LiteLLM](./providers/)
35
+ * [Build your own OpenAI proxy](https://github.com/BerriAI/liteLLM-proxy/tree/main)
36
+
37
+ ## streaming
38
+
39
+ Same example from before. Just pass in `stream=True` in the completion args.
40
+ ```python
41
+ from litellm import completion
42
+
43
+ ## set ENV variables
44
+ os.environ["OPENAI_API_KEY"] = "openai key"
45
+ os.environ["COHERE_API_KEY"] = "cohere key"
46
+
47
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
48
+
49
+ # openai call
50
+ response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
51
+
52
+ # cohere call
53
+ response = completion("command-nightly", messages, stream=True)
54
+
55
+ print(response)
56
+ ```
57
+
58
+ More details πŸ‘‰
59
+ * [streaming + async](./completion/stream.md)
60
+ * [tutorial for streaming Llama2 on TogetherAI](./tutorials/TogetherAI_liteLLM.md)
61
+
62
+ ## exception handling
63
+
64
+ LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
65
+
66
+ ```python
67
+ from openai.error import OpenAIError
68
+ from litellm import completion
69
+
70
+ os.environ["ANTHROPIC_API_KEY"] = "bad-key"
71
+ try:
72
+ # some code
73
+ completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
74
+ except OpenAIError as e:
75
+ print(e)
76
+ ```
77
+
78
+ ## Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks))
79
+ LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack
80
+ ```python
81
+ from litellm import completion
82
+
83
+ ## set env variables for logging tools
84
+ os.environ["LANGFUSE_PUBLIC_KEY"] = ""
85
+ os.environ["LANGFUSE_SECRET_KEY"] = ""
86
+ os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
87
+
88
+ os.environ["OPENAI_API_KEY"]
89
+
90
+ # set callbacks
91
+ litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase
92
+
93
+ #openai call
94
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}])
95
+ ```
96
+
97
+ More details πŸ‘‰
98
+ * [exception mapping](./exception_mapping.md)
99
+ * [retries + model fallbacks for completion()](./completion/reliable_completions.md)
100
+ * [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
docs/my-website/docs/index.md ADDED
@@ -0,0 +1,402 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Tabs from '@theme/Tabs';
2
+ import TabItem from '@theme/TabItem';
3
+
4
+ # LiteLLM - Getting Started
5
+
6
+ https://github.com/BerriAI/litellm
7
+
8
+ import QuickStart from '../src/components/QuickStart.js'
9
+
10
+ ## **Call 100+ LLMs using the same Input/Output Format**
11
+
12
+ ## Basic usage
13
+ <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
14
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
15
+ </a>
16
+
17
+ ```shell
18
+ pip install litellm
19
+ ```
20
+ <Tabs>
21
+ <TabItem value="openai" label="OpenAI">
22
+
23
+ ```python
24
+ from litellm import completion
25
+ import os
26
+
27
+ ## set ENV variables
28
+ os.environ["OPENAI_API_KEY"] = "your-api-key"
29
+
30
+ response = completion(
31
+ model="gpt-3.5-turbo",
32
+ messages=[{ "content": "Hello, how are you?","role": "user"}]
33
+ )
34
+ ```
35
+
36
+ </TabItem>
37
+ <TabItem value="anthropic" label="Anthropic">
38
+
39
+ ```python
40
+ from litellm import completion
41
+ import os
42
+
43
+ ## set ENV variables
44
+ os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
45
+
46
+ response = completion(
47
+ model="claude-2",
48
+ messages=[{ "content": "Hello, how are you?","role": "user"}]
49
+ )
50
+ ```
51
+
52
+ </TabItem>
53
+
54
+ <TabItem value="vertex" label="VertexAI">
55
+
56
+ ```python
57
+ from litellm import completion
58
+ import os
59
+
60
+ # auth: run 'gcloud auth application-default'
61
+ os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
62
+ os.environ["VERTEX_LOCATION"] = "us-central1"
63
+
64
+ response = completion(
65
+ model="chat-bison",
66
+ messages=[{ "content": "Hello, how are you?","role": "user"}]
67
+ )
68
+ ```
69
+
70
+ </TabItem>
71
+
72
+ <TabItem value="hugging" label="HuggingFace">
73
+
74
+ ```python
75
+ from litellm import completion
76
+ import os
77
+
78
+ os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
79
+
80
+ # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
81
+ response = completion(
82
+ model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
83
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
84
+ api_base="https://my-endpoint.huggingface.cloud"
85
+ )
86
+
87
+ print(response)
88
+ ```
89
+
90
+ </TabItem>
91
+
92
+ <TabItem value="azure" label="Azure OpenAI">
93
+
94
+ ```python
95
+ from litellm import completion
96
+ import os
97
+
98
+ ## set ENV variables
99
+ os.environ["AZURE_API_KEY"] = ""
100
+ os.environ["AZURE_API_BASE"] = ""
101
+ os.environ["AZURE_API_VERSION"] = ""
102
+
103
+ # azure call
104
+ response = completion(
105
+ "azure/<your_deployment_name>",
106
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
107
+ )
108
+ ```
109
+
110
+ </TabItem>
111
+
112
+
113
+ <TabItem value="ollama" label="Ollama">
114
+
115
+ ```python
116
+ from litellm import completion
117
+
118
+ response = completion(
119
+ model="ollama/llama2",
120
+ messages = [{ "content": "Hello, how are you?","role": "user"}],
121
+ api_base="http://localhost:11434"
122
+ )
123
+ ```
124
+ </TabItem>
125
+ <TabItem value="or" label="Openrouter">
126
+
127
+ ```python
128
+ from litellm import completion
129
+ import os
130
+
131
+ ## set ENV variables
132
+ os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
133
+
134
+ response = completion(
135
+ model="openrouter/google/palm-2-chat-bison",
136
+ messages = [{ "content": "Hello, how are you?","role": "user"}],
137
+ )
138
+ ```
139
+ </TabItem>
140
+
141
+ </Tabs>
142
+
143
+ ## Streaming
144
+ Set `stream=True` in the `completion` args.
145
+ <Tabs>
146
+ <TabItem value="openai" label="OpenAI">
147
+
148
+ ```python
149
+ from litellm import completion
150
+ import os
151
+
152
+ ## set ENV variables
153
+ os.environ["OPENAI_API_KEY"] = "your-api-key"
154
+
155
+ response = completion(
156
+ model="gpt-3.5-turbo",
157
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
158
+ stream=True,
159
+ )
160
+
161
+ for chunk in response:
162
+ print(chunk)
163
+ ```
164
+
165
+ </TabItem>
166
+ <TabItem value="anthropic" label="Anthropic">
167
+
168
+ ```python
169
+ from litellm import completion
170
+ import os
171
+
172
+ ## set ENV variables
173
+ os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
174
+
175
+ response = completion(
176
+ model="claude-2",
177
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
178
+ stream=True,
179
+ )
180
+
181
+ for chunk in response:
182
+ print(chunk)
183
+ ```
184
+
185
+ </TabItem>
186
+
187
+ <TabItem value="vertex" label="VertexAI">
188
+
189
+ ```python
190
+ from litellm import completion
191
+ import os
192
+
193
+ # auth: run 'gcloud auth application-default'
194
+ os.environ["VERTEX_PROJECT"] = "hardy-device-386718"
195
+ os.environ["VERTEX_LOCATION"] = "us-central1"
196
+
197
+ response = completion(
198
+ model="chat-bison",
199
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
200
+ stream=True,
201
+ )
202
+
203
+ for chunk in response:
204
+ print(chunk)
205
+ ```
206
+
207
+ </TabItem>
208
+
209
+ <TabItem value="hugging" label="HuggingFace">
210
+
211
+ ```python
212
+ from litellm import completion
213
+ import os
214
+
215
+ os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
216
+
217
+ # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints
218
+ response = completion(
219
+ model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0",
220
+ messages=[{ "content": "Hello, how are you?","role": "user"}],
221
+ api_base="https://my-endpoint.huggingface.cloud",
222
+ stream=True,
223
+ )
224
+
225
+
226
+ for chunk in response:
227
+ print(chunk)
228
+ ```
229
+
230
+ </TabItem>
231
+
232
+ <TabItem value="azure" label="Azure OpenAI">
233
+
234
+ ```python
235
+ from litellm import completion
236
+ import os
237
+
238
+ ## set ENV variables
239
+ os.environ["AZURE_API_KEY"] = ""
240
+ os.environ["AZURE_API_BASE"] = ""
241
+ os.environ["AZURE_API_VERSION"] = ""
242
+
243
+ # azure call
244
+ response = completion(
245
+ "azure/<your_deployment_name>",
246
+ messages = [{ "content": "Hello, how are you?","role": "user"}],
247
+ stream=True,
248
+ )
249
+
250
+ for chunk in response:
251
+ print(chunk)
252
+ ```
253
+
254
+ </TabItem>
255
+
256
+
257
+ <TabItem value="ollama" label="Ollama">
258
+
259
+ ```python
260
+ from litellm import completion
261
+
262
+ response = completion(
263
+ model="ollama/llama2",
264
+ messages = [{ "content": "Hello, how are you?","role": "user"}],
265
+ api_base="http://localhost:11434",
266
+ stream=True,
267
+ )
268
+
269
+ for chunk in response:
270
+ print(chunk)
271
+ ```
272
+ </TabItem>
273
+ <TabItem value="or" label="Openrouter">
274
+
275
+ ```python
276
+ from litellm import completion
277
+ import os
278
+
279
+ ## set ENV variables
280
+ os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key"
281
+
282
+ response = completion(
283
+ model="openrouter/google/palm-2-chat-bison",
284
+ messages = [{ "content": "Hello, how are you?","role": "user"}],
285
+ stream=True,
286
+ )
287
+
288
+ for chunk in response:
289
+ print(chunk)
290
+ ```
291
+ </TabItem>
292
+
293
+ </Tabs>
294
+
295
+ ## Exception handling
296
+
297
+ LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM.
298
+
299
+ ```python
300
+ from openai.error import OpenAIError
301
+ from litellm import completion
302
+
303
+ os.environ["ANTHROPIC_API_KEY"] = "bad-key"
304
+ try:
305
+ # some code
306
+ completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}])
307
+ except OpenAIError as e:
308
+ print(e)
309
+ ```
310
+
311
+ ## Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks))
312
+ LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack
313
+ ```python
314
+ from litellm import completion
315
+
316
+ ## set env variables for logging tools
317
+ os.environ["LANGFUSE_PUBLIC_KEY"] = ""
318
+ os.environ["LANGFUSE_SECRET_KEY"] = ""
319
+ os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
320
+
321
+ os.environ["OPENAI_API_KEY"]
322
+
323
+ # set callbacks
324
+ litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase
325
+
326
+ #openai call
327
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}])
328
+ ```
329
+
330
+ ## Calculate Costs, Usage, Latency
331
+
332
+ Pass the completion response to `litellm.completion_cost(completion_response=response)` and get the cost
333
+
334
+ ```python
335
+ from litellm import completion, completion_cost
336
+ import os
337
+ os.environ["OPENAI_API_KEY"] = "your-api-key"
338
+
339
+ response = completion(
340
+ model="gpt-3.5-turbo",
341
+ messages=[{ "content": "Hello, how are you?","role": "user"}]
342
+ )
343
+
344
+ cost = completion_cost(completion_response=response)
345
+ print("Cost for completion call with gpt-3.5-turbo: ", f"${float(cost):.10f}")
346
+ ```
347
+
348
+ **Output**
349
+ ```shell
350
+ Cost for completion call with gpt-3.5-turbo: $0.0000775000
351
+ ```
352
+
353
+ ### Track Costs, Usage, Latency for streaming
354
+ We use a custom callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
355
+ - We define a callback function to calculate cost `def track_cost_callback()`
356
+ - In `def track_cost_callback()` we check if the stream is complete - `if "complete_streaming_response" in kwargs`
357
+ - Use `litellm.completion_cost()` to calculate cost, once the stream is complete
358
+
359
+ ```python
360
+ import litellm
361
+
362
+ # track_cost_callback
363
+ def track_cost_callback(
364
+ kwargs, # kwargs to completion
365
+ completion_response, # response from completion
366
+ start_time, end_time # start/end time
367
+ ):
368
+ try:
369
+ # check if it has collected an entire stream response
370
+ if "complete_streaming_response" in kwargs:
371
+ # for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost
372
+ completion_response=kwargs["complete_streaming_response"]
373
+ input_text = kwargs["messages"]
374
+ output_text = completion_response["choices"][0]["message"]["content"]
375
+ response_cost = litellm.completion_cost(
376
+ model = kwargs["model"],
377
+ messages = input_text,
378
+ completion=output_text
379
+ )
380
+ print("streaming response_cost", response_cost)
381
+ except:
382
+ pass
383
+ # set callback
384
+ litellm.success_callback = [track_cost_callback] # set custom callback function
385
+
386
+ # litellm.completion() call
387
+ response = completion(
388
+ model="gpt-3.5-turbo",
389
+ messages=[
390
+ {
391
+ "role": "user",
392
+ "content": "Hi πŸ‘‹ - i'm openai"
393
+ }
394
+ ],
395
+ stream=True
396
+ )
397
+ ```
398
+
399
+ ## More details
400
+ * [exception mapping](./exception_mapping.md)
401
+ * [retries + model fallbacks for completion()](./completion/reliable_completions.md)
402
+ * [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
docs/my-website/docs/langchain/langchain.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Tabs from '@theme/Tabs';
2
+ import TabItem from '@theme/TabItem';
3
+
4
+ # Using ChatLiteLLM() - Langchain
5
+
6
+ ## Pre-Requisites
7
+ ```shell
8
+ !pip install litellm langchain
9
+ ```
10
+ ## Quick Start
11
+
12
+ <Tabs>
13
+ <TabItem value="openai" label="OpenAI">
14
+
15
+ ```python
16
+ import os
17
+ from langchain.chat_models import ChatLiteLLM
18
+ from langchain.prompts.chat import (
19
+ ChatPromptTemplate,
20
+ SystemMessagePromptTemplate,
21
+ AIMessagePromptTemplate,
22
+ HumanMessagePromptTemplate,
23
+ )
24
+ from langchain.schema import AIMessage, HumanMessage, SystemMessage
25
+
26
+ os.environ['OPENAI_API_KEY'] = ""
27
+ chat = ChatLiteLLM(model="gpt-3.5-turbo")
28
+ messages = [
29
+ HumanMessage(
30
+ content="what model are you"
31
+ )
32
+ ]
33
+ chat(messages)
34
+ ```
35
+
36
+ </TabItem>
37
+
38
+ <TabItem value="anthropic" label="Anthropic">
39
+
40
+ ```python
41
+ import os
42
+ from langchain.chat_models import ChatLiteLLM
43
+ from langchain.prompts.chat import (
44
+ ChatPromptTemplate,
45
+ SystemMessagePromptTemplate,
46
+ AIMessagePromptTemplate,
47
+ HumanMessagePromptTemplate,
48
+ )
49
+ from langchain.schema import AIMessage, HumanMessage, SystemMessage
50
+
51
+ os.environ['ANTHROPIC_API_KEY'] = ""
52
+ chat = ChatLiteLLM(model="claude-2", temperature=0.3)
53
+ messages = [
54
+ HumanMessage(
55
+ content="what model are you"
56
+ )
57
+ ]
58
+ chat(messages)
59
+ ```
60
+
61
+ </TabItem>
62
+
63
+ <TabItem value="replicate" label="Replicate">
64
+
65
+ ```python
66
+ import os
67
+ from langchain.chat_models import ChatLiteLLM
68
+ from langchain.prompts.chat import (
69
+ ChatPromptTemplate,
70
+ SystemMessagePromptTemplate,
71
+ AIMessagePromptTemplate,
72
+ HumanMessagePromptTemplate,
73
+ )
74
+ from langchain.schema import AIMessage, HumanMessage, SystemMessage
75
+
76
+ os.environ['REPLICATE_API_TOKEN'] = ""
77
+ chat = ChatLiteLLM(model="replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1")
78
+ messages = [
79
+ HumanMessage(
80
+ content="what model are you?"
81
+ )
82
+ ]
83
+ chat(messages)
84
+ ```
85
+
86
+ </TabItem>
87
+
88
+ <TabItem value="cohere" label="Cohere">
89
+
90
+ ```python
91
+ import os
92
+ from langchain.chat_models import ChatLiteLLM
93
+ from langchain.prompts.chat import (
94
+ ChatPromptTemplate,
95
+ SystemMessagePromptTemplate,
96
+ AIMessagePromptTemplate,
97
+ HumanMessagePromptTemplate,
98
+ )
99
+ from langchain.schema import AIMessage, HumanMessage, SystemMessage
100
+
101
+ os.environ['COHERE_API_KEY'] = ""
102
+ chat = ChatLiteLLM(model="command-nightly")
103
+ messages = [
104
+ HumanMessage(
105
+ content="what model are you?"
106
+ )
107
+ ]
108
+ chat(messages)
109
+ ```
110
+
111
+ </TabItem>
112
+ <TabItem value="palm" label="PaLM - Google">
113
+
114
+ ```python
115
+ import os
116
+ from langchain.chat_models import ChatLiteLLM
117
+ from langchain.prompts.chat import (
118
+ ChatPromptTemplate,
119
+ SystemMessagePromptTemplate,
120
+ AIMessagePromptTemplate,
121
+ HumanMessagePromptTemplate,
122
+ )
123
+ from langchain.schema import AIMessage, HumanMessage, SystemMessage
124
+
125
+ os.environ['PALM_API_KEY'] = ""
126
+ chat = ChatLiteLLM(model="palm/chat-bison")
127
+ messages = [
128
+ HumanMessage(
129
+ content="what model are you?"
130
+ )
131
+ ]
132
+ chat(messages)
133
+ ```
134
+ </TabItem>
135
+ </Tabs>
docs/my-website/docs/migration.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Migration Guide - LiteLLM v1.0.0+
2
+
3
+ When we have breaking changes (i.e. going from 1.x.x to 2.x.x), we will document those changes here.
4
+
5
+
6
+ ## `1.0.0`
7
+
8
+ **Last Release before breaking change**: 0.14.0
9
+
10
+ **What changed?**
11
+
12
+ - Requires `openai>=1.0.0`
13
+ - `openai.InvalidRequestError`Β β†’Β `openai.BadRequestError`
14
+ - `openai.ServiceUnavailableError` β†’ `openai.APIStatusError`
15
+ - *NEW* litellm client, allow users to pass api_key
16
+ - `litellm.Litellm(api_key="sk-123")`
17
+ - response objects now inherit from `BaseModel` (prev. `OpenAIObject`)
18
+ - *NEW* default exception - `APIConnectionError` (prev. `APIError`)
19
+ - litellm.get_max_tokens() now returns an int not a dict
20
+ ```python
21
+ max_tokens = litellm.get_max_tokens("gpt-3.5-turbo") # returns an int not a dict
22
+ assert max_tokens==4097
23
+ ```
24
+ - Streaming - OpenAI Chunks now return `None` for empty stream chunks. This is how to process stream chunks with content
25
+ ```python
26
+ response = litellm.completion(model="gpt-3.5-turbo", messages=messages, stream=True)
27
+ for part in response:
28
+ print(part.choices[0].delta.content or "")
29
+ ```
30
+
31
+ **How can we communicate changes better?**
32
+ Tell us
33
+ - [Discord](https://discord.com/invite/wuPM9dRgDw)
34
35
+ - Text us (+17708783106)
docs/my-website/docs/observability/callbacks.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Callbacks
2
+
3
+ ## Use Callbacks to send Output Data to Posthog, Sentry etc
4
+
5
+ liteLLM provides `input_callbacks`, `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.
6
+
7
+ liteLLM supports:
8
+
9
+ - [Custom Callback Functions](https://docs.litellm.ai/docs/observability/custom_callback)
10
+ - [LLMonitor](https://llmonitor.com/docs)
11
+ - [Helicone](https://docs.helicone.ai/introduction)
12
+ - [Traceloop](https://traceloop.com/docs)
13
+ - [Sentry](https://docs.sentry.io/platforms/python/)
14
+ - [PostHog](https://posthog.com/docs/libraries/python)
15
+ - [Slack](https://slack.dev/bolt-python/concepts)
16
+
17
+ ### Quick Start
18
+
19
+ ```python
20
+ from litellm import completion
21
+
22
+ # set callbacks
23
+ litellm.input_callback=["sentry"] # for sentry breadcrumbing - logs the input being sent to the api
24
+ litellm.success_callback=["posthog", "helicone", "llmonitor"]
25
+ litellm.failure_callback=["sentry", "llmonitor"]
26
+
27
+ ## set env variables
28
+ os.environ['SENTRY_DSN'], os.environ['SENTRY_API_TRACE_RATE']= ""
29
+ os.environ['POSTHOG_API_KEY'], os.environ['POSTHOG_API_URL'] = "api-key", "api-url"
30
+ os.environ["HELICONE_API_KEY"] = ""
31
+ os.environ["TRACELOOP_API_KEY"] = ""
32
+ os.environ["LLMONITOR_APP_ID"] = ""
33
+
34
+ response = completion(model="gpt-3.5-turbo", messages=messages)
35
+ ```
docs/my-website/docs/observability/custom_callback.md ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Custom Callbacks
2
+
3
+ ## Callback Class
4
+ You can create a custom callback class to precisely log events as they occur in litellm.
5
+
6
+ ```python
7
+ from litellm.integrations.custom_logger import CustomLogger
8
+
9
+ class MyCustomHandler(CustomLogger):
10
+ def log_pre_api_call(self, model, messages, kwargs):
11
+ print(f"Pre-API Call")
12
+
13
+ def log_post_api_call(self, kwargs, response_obj, start_time, end_time):
14
+ print(f"Post-API Call")
15
+
16
+ def log_stream_event(self, kwargs, response_obj, start_time, end_time):
17
+ print(f"On Stream")
18
+
19
+ def log_success_event(self, kwargs, response_obj, start_time, end_time):
20
+ print(f"On Success")
21
+
22
+ def log_failure_event(self, kwargs, response_obj, start_time, end_time):
23
+ print(f"On Failure")
24
+
25
+ customHandler = MyCustomHandler()
26
+
27
+ litellm.callbacks = [customHandler]
28
+ response = completion(model="gpt-3.5-turbo", messages=[{ "role": "user", "content": "Hi πŸ‘‹ - i'm openai"}],
29
+ stream=True)
30
+ for chunk in response:
31
+ continue
32
+ ```
33
+
34
+ ## Callback Functions
35
+ If you just want to log on a specific event (e.g. on input) - you can use callback functions.
36
+
37
+ You can set custom callbacks to trigger for:
38
+ - `litellm.input_callback` - Track inputs/transformed inputs before making the LLM API call
39
+ - `litellm.success_callback` - Track inputs/outputs after making LLM API call
40
+ - `litellm.failure_callback` - Track inputs/outputs + exceptions for litellm calls
41
+
42
+ ## Defining a Custom Callback Function
43
+ Create a custom callback function that takes specific arguments:
44
+
45
+ ```python
46
+ def custom_callback(
47
+ kwargs, # kwargs to completion
48
+ completion_response, # response from completion
49
+ start_time, end_time # start/end time
50
+ ):
51
+ # Your custom code here
52
+ print("LITELLM: in custom callback function")
53
+ print("kwargs", kwargs)
54
+ print("completion_response", completion_response)
55
+ print("start_time", start_time)
56
+ print("end_time", end_time)
57
+ ```
58
+
59
+ ### Setting the custom callback function
60
+ ```python
61
+ import litellm
62
+ litellm.success_callback = [custom_callback]
63
+ ```
64
+
65
+ ## Using Your Custom Callback Function
66
+
67
+ ```python
68
+ import litellm
69
+ from litellm import completion
70
+
71
+ # Assign the custom callback function
72
+ litellm.success_callback = [custom_callback]
73
+
74
+ response = completion(
75
+ model="gpt-3.5-turbo",
76
+ messages=[
77
+ {
78
+ "role": "user",
79
+ "content": "Hi πŸ‘‹ - i'm openai"
80
+ }
81
+ ]
82
+ )
83
+
84
+ print(response)
85
+
86
+ ```
87
+
88
+ ## Async Callback Functions
89
+
90
+ LiteLLM currently supports just async success callback functions for async completion/embedding calls.
91
+
92
+ ```python
93
+ import asyncio, litellm
94
+
95
+ async def async_test_logging_fn(kwargs, completion_obj, start_time, end_time):
96
+ print(f"On Async Success!")
97
+
98
+ async def test_chat_openai():
99
+ try:
100
+ # litellm.set_verbose = True
101
+ litellm.success_callback = [async_test_logging_fn]
102
+ response = await litellm.acompletion(model="gpt-3.5-turbo",
103
+ messages=[{
104
+ "role": "user",
105
+ "content": "Hi πŸ‘‹ - i'm openai"
106
+ }],
107
+ stream=True)
108
+ async for chunk in response:
109
+ continue
110
+ except Exception as e:
111
+ print(e)
112
+ pytest.fail(f"An error occurred - {str(e)}")
113
+
114
+ asyncio.run(test_chat_openai())
115
+ ```
116
+
117
+ :::info
118
+
119
+ We're actively trying to expand this to other event types. [Tell us if you need this!](https://github.com/BerriAI/litellm/issues/1007)
120
+
121
+
122
+
123
+ :::
124
+
125
+ ## What's in kwargs?
126
+
127
+ Notice we pass in a kwargs argument to custom callback.
128
+ ```python
129
+ def custom_callback(
130
+ kwargs, # kwargs to completion
131
+ completion_response, # response from completion
132
+ start_time, end_time # start/end time
133
+ ):
134
+ # Your custom code here
135
+ print("LITELLM: in custom callback function")
136
+ print("kwargs", kwargs)
137
+ print("completion_response", completion_response)
138
+ print("start_time", start_time)
139
+ print("end_time", end_time)
140
+ ```
141
+
142
+ This is a dictionary containing all the model-call details (the params we receive, the values we send to the http endpoint, the response we receive, stacktrace in case of errors, etc.).
143
+
144
+ This is all logged in the [model_call_details via our Logger](https://github.com/BerriAI/litellm/blob/fc757dc1b47d2eb9d0ea47d6ad224955b705059d/litellm/utils.py#L246).
145
+
146
+ Here's exactly what you can expect in the kwargs dictionary:
147
+ ```shell
148
+ ### DEFAULT PARAMS ###
149
+ "model": self.model,
150
+ "messages": self.messages,
151
+ "optional_params": self.optional_params, # model-specific params passed in
152
+ "litellm_params": self.litellm_params, # litellm-specific params passed in (e.g. metadata passed to completion call)
153
+ "start_time": self.start_time, # datetime object of when call was started
154
+
155
+ ### PRE-API CALL PARAMS ### (check via kwargs["log_event_type"]="pre_api_call")
156
+ "input" = input # the exact prompt sent to the LLM API
157
+ "api_key" = api_key # the api key used for that LLM API
158
+ "additional_args" = additional_args # any additional details for that API call (e.g. contains optional params sent)
159
+
160
+ ### POST-API CALL PARAMS ### (check via kwargs["log_event_type"]="post_api_call")
161
+ "original_response" = original_response # the original http response received (saved via response.text)
162
+
163
+ ### ON-SUCCESS PARAMS ### (check via kwargs["log_event_type"]="successful_api_call")
164
+ "complete_streaming_response" = complete_streaming_response # the complete streamed response (only set if `completion(..stream=True)`)
165
+ "end_time" = end_time # datetime object of when call was completed
166
+
167
+ ### ON-FAILURE PARAMS ### (check via kwargs["log_event_type"]="failed_api_call")
168
+ "exception" = exception # the Exception raised
169
+ "traceback_exception" = traceback_exception # the traceback generated via `traceback.format_exc()`
170
+ "end_time" = end_time # datetime object of when call was completed
171
+ ```
172
+
173
+ ### Get complete streaming response
174
+
175
+ LiteLLM will pass you the complete streaming response in the final streaming chunk as part of the kwargs for your custom callback function.
176
+
177
+ ```python
178
+ # litellm.set_verbose = False
179
+ def custom_callback(
180
+ kwargs, # kwargs to completion
181
+ completion_response, # response from completion
182
+ start_time, end_time # start/end time
183
+ ):
184
+ # print(f"streaming response: {completion_response}")
185
+ if "complete_streaming_response" in kwargs:
186
+ print(f"Complete Streaming Response: {kwargs['complete_streaming_response']}")
187
+
188
+ # Assign the custom callback function
189
+ litellm.success_callback = [custom_callback]
190
+
191
+ response = completion(model="claude-instant-1", messages=messages, stream=True)
192
+ for idx, chunk in enumerate(response):
193
+ pass
194
+ ```
195
+
196
+
197
+ ### Log additional metadata
198
+
199
+ LiteLLM accepts a metadata dictionary in the completion call. You can pass additional metadata into your completion call via `completion(..., metadata={"key": "value"})`.
200
+
201
+ Since this is a [litellm-specific param](https://github.com/BerriAI/litellm/blob/b6a015404eed8a0fa701e98f4581604629300ee3/litellm/main.py#L235), it's accessible via kwargs["litellm_params"]
202
+
203
+ ```python
204
+ from litellm import completion
205
+ import os, litellm
206
+
207
+ ## set ENV variables
208
+ os.environ["OPENAI_API_KEY"] = "your-api-key"
209
+
210
+ messages = [{ "content": "Hello, how are you?","role": "user"}]
211
+
212
+ def custom_callback(
213
+ kwargs, # kwargs to completion
214
+ completion_response, # response from completion
215
+ start_time, end_time # start/end time
216
+ ):
217
+ print(kwargs["litellm_params"]["metadata"])
218
+
219
+
220
+ # Assign the custom callback function
221
+ litellm.success_callback = [custom_callback]
222
+
223
+ response = litellm.completion(model="gpt-3.5-turbo", messages=messages, metadata={"hello": "world"})
224
+ ```
225
+
226
+ ## Examples
227
+
228
+ ### Custom Callback to track costs for Streaming + Non-Streaming
229
+ ```python
230
+
231
+ def track_cost_callback(
232
+ kwargs, # kwargs to completion
233
+ completion_response, # response from completion
234
+ start_time, end_time # start/end time
235
+ ):
236
+ try:
237
+ # init logging config
238
+ logging.basicConfig(
239
+ filename='cost.log',
240
+ level=logging.INFO,
241
+ format='%(asctime)s - %(message)s',
242
+ datefmt='%Y-%m-%d %H:%M:%S'
243
+ )
244
+
245
+ # check if it has collected an entire stream response
246
+ if "complete_streaming_response" in kwargs:
247
+ # for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost
248
+ completion_response=kwargs["complete_streaming_response"]
249
+ input_text = kwargs["messages"]
250
+ output_text = completion_response["choices"][0]["message"]["content"]
251
+ response_cost = litellm.completion_cost(
252
+ model = kwargs["model"],
253
+ messages = input_text,
254
+ completion=output_text
255
+ )
256
+ print("streaming response_cost", response_cost)
257
+ logging.info(f"Model {kwargs['model']} Cost: ${response_cost:.8f}")
258
+
259
+ # for non streaming responses
260
+ else:
261
+ # we pass the completion_response obj
262
+ if kwargs["stream"] != True:
263
+ response_cost = litellm.completion_cost(completion_response=completion_response)
264
+ print("regular response_cost", response_cost)
265
+ logging.info(f"Model {completion_response.model} Cost: ${response_cost:.8f}")
266
+ except:
267
+ pass
268
+
269
+ # Assign the custom callback function
270
+ litellm.success_callback = [track_cost_callback]
271
+
272
+ response = completion(
273
+ model="gpt-3.5-turbo",
274
+ messages=[
275
+ {
276
+ "role": "user",
277
+ "content": "Hi πŸ‘‹ - i'm openai"
278
+ }
279
+ ]
280
+ )
281
+
282
+ print(response)
283
+ ```
284
+
285
+ ### Custom Callback to log transformed Input to LLMs
286
+ ```python
287
+ def get_transformed_inputs(
288
+ kwargs,
289
+ ):
290
+ params_to_model = kwargs["additional_args"]["complete_input_dict"]
291
+ print("params to model", params_to_model)
292
+
293
+ litellm.input_callback = [get_transformed_inputs]
294
+
295
+ def test_chat_openai():
296
+ try:
297
+ response = completion(model="claude-2",
298
+ messages=[{
299
+ "role": "user",
300
+ "content": "Hi πŸ‘‹ - i'm openai"
301
+ }])
302
+
303
+ print(response)
304
+
305
+ except Exception as e:
306
+ print(e)
307
+ pass
308
+ ```
309
+
310
+ #### Output
311
+ ```shell
312
+ params to model {'model': 'claude-2', 'prompt': "\n\nHuman: Hi πŸ‘‹ - i'm openai\n\nAssistant: ", 'max_tokens_to_sample': 256}
313
+ ```
314
+
315
+ ### Custom Callback to write to Mixpanel
316
+
317
+ ```python
318
+ import mixpanel
319
+ import litellm
320
+ from litellm import completion
321
+
322
+ def custom_callback(
323
+ kwargs, # kwargs to completion
324
+ completion_response, # response from completion
325
+ start_time, end_time # start/end time
326
+ ):
327
+ # Your custom code here
328
+ mixpanel.track("LLM Response", {"llm_response": completion_response})
329
+
330
+
331
+ # Assign the custom callback function
332
+ litellm.success_callback = [custom_callback]
333
+
334
+ response = completion(
335
+ model="gpt-3.5-turbo",
336
+ messages=[
337
+ {
338
+ "role": "user",
339
+ "content": "Hi πŸ‘‹ - i'm openai"
340
+ }
341
+ ]
342
+ )
343
+
344
+ print(response)
345
+
346
+ ```
347
+
348
+
349
+
350
+
351
+
352
+
353
+
354
+
355
+
356
+
357
+
358
+
docs/my-website/docs/observability/helicone_integration.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Helicone Tutorial
2
+ [Helicone](https://helicone.ai/) is an open source observability platform that proxies your OpenAI traffic and provides you key insights into your spend, latency and usage.
3
+
4
+ ## Use Helicone to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
5
+ liteLLM provides `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.
6
+
7
+ In this case, we want to log requests to Helicone when a request succeeds.
8
+
9
+ ### Approach 1: Use Callbacks
10
+ Use just 1 line of code, to instantly log your responses **across all providers** with helicone:
11
+ ```python
12
+ litellm.success_callback=["helicone"]
13
+ ```
14
+
15
+ Complete code
16
+ ```python
17
+ from litellm import completion
18
+
19
+ ## set env variables
20
+ os.environ["HELICONE_API_KEY"] = "your-helicone-key"
21
+ os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
22
+
23
+ # set callbacks
24
+ litellm.success_callback=["helicone"]
25
+
26
+ #openai call
27
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}])
28
+
29
+ #cohere call
30
+ response = completion(model="command-nightly", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm cohere"}])
31
+ ```
32
+
33
+ ### Approach 2: [OpenAI + Azure only] Use Helicone as a proxy
34
+ Helicone provides advanced functionality like caching, etc. Helicone currently supports this for Azure and OpenAI.
35
+
36
+ If you want to use Helicone to proxy your OpenAI/Azure requests, then you can -
37
+
38
+ - Set helicone as your base url via: `litellm.api_url`
39
+ - Pass in helicone request headers via: `litellm.headers`
40
+
41
+ Complete Code
42
+ ```python
43
+ import litellm
44
+ from litellm import completion
45
+
46
+ litellm.api_base = "https://oai.hconeai.com/v1"
47
+ litellm.headers = {"Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}"}
48
+
49
+ response = litellm.completion(
50
+ model="gpt-3.5-turbo",
51
+ messages=[{"role": "user", "content": "how does a court case get to the Supreme Court?"}]
52
+ )
53
+
54
+ print(response)
55
+ ```
docs/my-website/docs/observability/langfuse_integration.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Image from '@theme/IdealImage';
2
+
3
+ # Langfuse - Logging LLM Input/Output
4
+
5
+ LangFuse is open Source Observability & Analytics for LLM Apps
6
+ Detailed production traces and a granular view on quality, cost and latency
7
+
8
+ <Image img={require('../../img/langfuse.png')} />
9
+
10
+ :::info
11
+ We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
12
+ join our [discord](https://discord.gg/wuPM9dRgDw)
13
+ :::
14
+
15
+ ## Pre-Requisites
16
+ Ensure you have run `pip install langfuse` for this integration
17
+ ```shell
18
+ pip install langfuse litellm
19
+ ```
20
+
21
+ ## Quick Start
22
+ Use just 2 lines of code, to instantly log your responses **across all providers** with Langfuse
23
+ <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/logging_observability/LiteLLM_Langfuse.ipynb">
24
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
25
+ </a>
26
+
27
+ Get your Langfuse API Keys from https://cloud.langfuse.com/
28
+ ```python
29
+ litellm.success_callback = ["langfuse"]
30
+ ```
31
+ ```python
32
+ # pip install langfuse
33
+ import litellm
34
+ import os
35
+
36
+ # from https://cloud.langfuse.com/
37
+ os.environ["LANGFUSE_PUBLIC_KEY"] = ""
38
+ os.environ["LANGFUSE_SECRET_KEY"] = ""
39
+ # Optional, defaults to https://cloud.langfuse.com
40
+ os.environ["LANGFUSE_HOST"] # optional
41
+
42
+ # LLM API Keys
43
+ os.environ['OPENAI_API_KEY']=""
44
+
45
+ # set langfuse as a callback, litellm will send the data to langfuse
46
+ litellm.success_callback = ["langfuse"]
47
+
48
+ # openai call
49
+ response = litellm.completion(
50
+ model="gpt-3.5-turbo",
51
+ messages=[
52
+ {"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}
53
+ ]
54
+ )
55
+ ```
56
+
57
+ ## Advanced
58
+ ### Set Custom Generation names, pass metadata
59
+
60
+ ```python
61
+ import litellm
62
+ from litellm import completion
63
+ import os
64
+
65
+ # from https://cloud.langfuse.com/
66
+ os.environ["LANGFUSE_PUBLIC_KEY"] = ""
67
+ os.environ["LANGFUSE_SECRET_KEY"] = ""
68
+
69
+
70
+ # OpenAI and Cohere keys
71
+ # You can use any of the litellm supported providers: https://docs.litellm.ai/docs/providers
72
+ os.environ['OPENAI_API_KEY']=""
73
+
74
+ # set langfuse as a callback, litellm will send the data to langfuse
75
+ litellm.success_callback = ["langfuse"]
76
+
77
+ # openai call
78
+ response = completion(
79
+ model="gpt-3.5-turbo",
80
+ messages=[
81
+ {"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}
82
+ ],
83
+ metadata = {
84
+ "generation_name": "litellm-ishaan-gen", # set langfuse generation name
85
+ # custom metadata fields
86
+ "project": "litellm-proxy"
87
+ }
88
+ )
89
+
90
+ print(response)
91
+
92
+ ```
93
+
94
+
95
+
96
+ ## Troubleshooting & Errors
97
+ ### Data not getting logged to Langfuse ?
98
+ - Ensure you're on the latest version of langfuse `pip install langfuse -U`. The latest version allows litellm to log JSON input/outputs to langfuse
99
+
100
+ ## Support & Talk to Founders
101
+
102
+ - [Schedule Demo πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
103
+ - [Community Discord πŸ’­](https://discord.gg/wuPM9dRgDw)
104
+ - Our numbers πŸ“ž +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
105
+ - Our emails βœ‰οΈ [email protected] / [email protected]
docs/my-website/docs/observability/langsmith_integration.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Image from '@theme/IdealImage';
2
+
3
+ # Langsmith - Logging LLM Input/Output
4
+ An all-in-one developer platform for every step of the application lifecycle
5
+ https://smith.langchain.com/
6
+
7
+ <Image img={require('../../img/langsmith.png')} />
8
+
9
+ :::info
10
+ We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
11
+ join our [discord](https://discord.gg/wuPM9dRgDw)
12
+ :::
13
+
14
+ ## Pre-Requisites
15
+ ```shell
16
+ pip install litellm
17
+ ```
18
+
19
+ ## Quick Start
20
+ Use just 2 lines of code, to instantly log your responses **across all providers** with Langsmith
21
+
22
+
23
+ ```python
24
+ litellm.success_callback = ["langsmith"]
25
+ ```
26
+ ```python
27
+ import litellm
28
+ import os
29
+
30
+ os.environ["LANGSMITH_API_KEY"] = ""
31
+ # LLM API Keys
32
+ os.environ['OPENAI_API_KEY']=""
33
+
34
+ # set langsmith as a callback, litellm will send the data to langsmith
35
+ litellm.success_callback = ["langsmith"]
36
+
37
+ # openai call
38
+ response = litellm.completion(
39
+ model="gpt-3.5-turbo",
40
+ messages=[
41
+ {"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}
42
+ ]
43
+ )
44
+ ```
45
+
46
+ ## Advanced
47
+ ### Set Custom Project & Run names
48
+
49
+ ```python
50
+ import litellm
51
+ import os
52
+
53
+ os.environ["LANGSMITH_API_KEY"] = ""
54
+ # LLM API Keys
55
+ os.environ['OPENAI_API_KEY']=""
56
+
57
+ # set langfuse as a callback, litellm will send the data to langfuse
58
+ litellm.success_callback = ["langfuse"]
59
+
60
+ response = litellm.completion(
61
+ model="gpt-3.5-turbo",
62
+ messages=[
63
+ {"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}
64
+ ],
65
+ metadata={
66
+ "run_name": "litellmRUN", # langsmith run name
67
+ "project_name": "litellm-completion", # langsmith project name
68
+ }
69
+ )
70
+ print(response)
71
+ ```
72
+ ## Support & Talk to Founders
73
+
74
+ - [Schedule Demo πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
75
+ - [Community Discord πŸ’­](https://discord.gg/wuPM9dRgDw)
76
+ - Our numbers πŸ“ž +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
77
+ - Our emails βœ‰οΈ [email protected] / [email protected]
docs/my-website/docs/observability/llmonitor_integration.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLMonitor Tutorial
2
+
3
+ [LLMonitor](https://llmonitor.com/) is an open-source observability platform that provides cost tracking, user tracking and powerful agent tracing.
4
+
5
+ <video controls width='900' >
6
+ <source src='https://llmonitor.com/videos/demo-annotated.mp4'/>
7
+ </video>
8
+
9
+ ## Use LLMonitor to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
10
+
11
+ liteLLM provides `callbacks`, making it easy for you to log data depending on the status of your responses.
12
+
13
+ :::info
14
+ We want to learn how we can make the callbacks better! Meet the [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
15
+ join our [discord](https://discord.gg/wuPM9dRgDw)
16
+ :::
17
+
18
+ ### Using Callbacks
19
+
20
+ First, sign up to get an app ID on the [LLMonitor dashboard](https://llmonitor.com).
21
+
22
+ Use just 2 lines of code, to instantly log your responses **across all providers** with llmonitor:
23
+
24
+ ```python
25
+ litellm.success_callback = ["llmonitor"]
26
+ litellm.failure_callback = ["llmonitor"]
27
+ ```
28
+
29
+ Complete code
30
+
31
+ ```python
32
+ from litellm import completion
33
+
34
+ ## set env variables
35
+ os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
36
+ # Optional: os.environ["LLMONITOR_API_URL"] = "self-hosting-url"
37
+
38
+ os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
39
+
40
+ # set callbacks
41
+ litellm.success_callback = ["llmonitor"]
42
+ litellm.failure_callback = ["llmonitor"]
43
+
44
+ #openai call
45
+ response = completion(
46
+ model="gpt-3.5-turbo",
47
+ messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}],
48
+ user="ishaan_litellm"
49
+ )
50
+
51
+ #cohere call
52
+ response = completion(
53
+ model="command-nightly",
54
+ messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm cohere"}],
55
+ user="ishaan_litellm"
56
+ )
57
+ ```
58
+
59
+ ## Support & Talk to Founders
60
+
61
+ - [Schedule Demo πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
62
+ - [Community Discord πŸ’­](https://discord.gg/wuPM9dRgDw)
63
+ - Our numbers πŸ“ž +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
64
+ - Our emails βœ‰οΈ [email protected] / [email protected]
65
+ - Meet the LLMonitor team on [Discord](http://discord.com/invite/8PafSG58kK) or via [email](mailto:[email protected]).
docs/my-website/docs/observability/promptlayer_integration.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Promptlayer Tutorial
2
+
3
+ Promptlayer is a platform for prompt engineers. Log OpenAI requests. Search usage history. Track performance. Visually manage prompt templates.
4
+
5
+ <Image img={require('../../img/promptlayer.png')} />
6
+
7
+ ## Use Promptlayer to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
8
+
9
+ liteLLM provides `callbacks`, making it easy for you to log data depending on the status of your responses.
10
+
11
+ ### Using Callbacks
12
+
13
+ Get your PromptLayer API Key from https://promptlayer.com/
14
+
15
+ Use just 2 lines of code, to instantly log your responses **across all providers** with promptlayer:
16
+
17
+ ```python
18
+ litellm.success_callback = ["promptlayer"]
19
+
20
+ ```
21
+
22
+ Complete code
23
+
24
+ ```python
25
+ from litellm import completion
26
+
27
+ ## set env variables
28
+ os.environ["PROMPTLAYER_API_KEY"] = "your-promptlayer-key"
29
+
30
+ os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
31
+
32
+ # set callbacks
33
+ litellm.success_callback = ["promptlayer"]
34
+
35
+ #openai call
36
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}])
37
+
38
+ #cohere call
39
+ response = completion(model="command-nightly", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm cohere"}])
40
+ ```
41
+
42
+ ### Logging Metadata
43
+
44
+ You can also log completion call metadata to Promptlayer.
45
+
46
+ You can add metadata to a completion call through the metadata param:
47
+ ```python
48
+ completion(model,messages, metadata={"model": "ai21"})
49
+ ```
50
+
51
+ **Complete Code**
52
+ ```python
53
+ from litellm import completion
54
+
55
+ ## set env variables
56
+ os.environ["PROMPTLAYER_API_KEY"] = "your-promptlayer-key"
57
+
58
+ os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", ""
59
+
60
+ # set callbacks
61
+ litellm.success_callback = ["promptlayer"]
62
+
63
+ #openai call - log llm provider is openai
64
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}], metadata={"provider": "openai"})
65
+
66
+ #cohere call - log llm provider is cohere
67
+ response = completion(model="command-nightly", messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm cohere"}], metadata={"provider": "cohere"})
68
+ ```
69
+
70
+ Credits to [Nick Bradford](https://github.com/nsbradford), from [Vim-GPT](https://github.com/nsbradford/VimGPT), for the suggestion.
71
+
72
+ ## Support & Talk to Founders
73
+
74
+ - [Schedule Demo πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
75
+ - [Community Discord πŸ’­](https://discord.gg/wuPM9dRgDw)
76
+ - Our numbers πŸ“ž +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
77
+ - Our emails βœ‰οΈ [email protected] / [email protected]
docs/my-website/docs/observability/sentry.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Image from '@theme/IdealImage';
2
+
3
+ # Sentry - Log LLM Exceptions
4
+ [Sentry](https://sentry.io/) provides error monitoring for production. LiteLLM can add breadcrumbs and send exceptions to Sentry with this integration
5
+
6
+ Track exceptions for:
7
+ - litellm.completion() - completion()for 100+ LLMs
8
+ - litellm.acompletion() - async completion()
9
+ - Streaming completion() & acompletion() calls
10
+
11
+ <Image img={require('../../img/sentry.png')} />
12
+
13
+
14
+ ## Usage
15
+
16
+ ### Set SENTRY_DSN & callback
17
+
18
+ ```python
19
+ import litellm, os
20
+ os.environ["SENTRY_DSN"] = "your-sentry-url"
21
+ litellm.failure_callback=["sentry"]
22
+ ```
23
+
24
+ ### Sentry callback with completion
25
+ ```python
26
+ import litellm
27
+ from litellm import completion
28
+
29
+ litellm.input_callback=["sentry"] # adds sentry breadcrumbing
30
+ litellm.failure_callback=["sentry"] # [OPTIONAL] if you want litellm to capture -> send exception to sentry
31
+
32
+ import os
33
+ os.environ["SENTRY_DSN"] = "your-sentry-url"
34
+ os.environ["OPENAI_API_KEY"] = "your-openai-key"
35
+
36
+ # set bad key to trigger error
37
+ api_key="bad-key"
38
+ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey!"}], stream=True, api_key=api_key)
39
+
40
+ print(response)
41
+ ```
42
+
43
+ [Let us know](https://github.com/BerriAI/litellm/issues/new?assignees=&labels=enhancement&projects=&template=feature_request.yml&title=%5BFeature%5D%3A+) if you need any additional options from Sentry.
44
+
docs/my-website/docs/observability/slack_integration.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Image from '@theme/IdealImage';
2
+
3
+ # Slack - Logging LLM Input/Output, Exceptions
4
+
5
+ <Image img={require('../../img/slack.png')} />
6
+
7
+ :::info
8
+ We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
9
+ join our [discord](https://discord.gg/wuPM9dRgDw)
10
+ :::
11
+
12
+ ## Pre-Requisites
13
+
14
+ ### Step 1
15
+ ```shell
16
+ pip install litellm
17
+ ```
18
+
19
+ ### Step 2
20
+ Get a slack webhook url from https://api.slack.com/messaging/webhooks
21
+
22
+
23
+
24
+ ## Quick Start
25
+ ### Create a custom Callback to log to slack
26
+ We create a custom callback, to log to slack webhooks, see [custom callbacks on litellm](https://docs.litellm.ai/docs/observability/custom_callback)
27
+ ```python
28
+ def send_slack_alert(
29
+ kwargs,
30
+ completion_response,
31
+ start_time,
32
+ end_time,
33
+ ):
34
+ print(
35
+ "in custom slack callback func"
36
+ )
37
+ import requests
38
+ import json
39
+
40
+ # Define the Slack webhook URL
41
+ # get it from https://api.slack.com/messaging/webhooks
42
+ slack_webhook_url = os.environ['SLACK_WEBHOOK_URL'] # "https://hooks.slack.com/services/<>/<>/<>"
43
+
44
+ # Define the text payload, send data available in litellm custom_callbacks
45
+ text_payload = f"""LiteLLM Logging: kwargs: {str(kwargs)}\n\n, response: {str(completion_response)}\n\n, start time{str(start_time)} end time: {str(end_time)}
46
+ """
47
+ payload = {
48
+ "text": text_payload
49
+ }
50
+
51
+ # Set the headers
52
+ headers = {
53
+ "Content-type": "application/json"
54
+ }
55
+
56
+ # Make the POST request
57
+ response = requests.post(slack_webhook_url, json=payload, headers=headers)
58
+
59
+ # Check the response status
60
+ if response.status_code == 200:
61
+ print("Message sent successfully to Slack!")
62
+ else:
63
+ print(f"Failed to send message to Slack. Status code: {response.status_code}")
64
+ print(response.json())
65
+ ```
66
+
67
+ ### Pass callback to LiteLLM
68
+ ```python
69
+ litellm.success_callback = [send_slack_alert]
70
+ ```
71
+
72
+ ```python
73
+ import litellm
74
+ litellm.success_callback = [send_slack_alert] # log success
75
+ litellm.failure_callback = [send_slack_alert] # log exceptions
76
+
77
+ # this will raise an exception
78
+ response = litellm.completion(
79
+ model="gpt-2",
80
+ messages=[
81
+ {
82
+ "role": "user",
83
+ "content": "Hi πŸ‘‹ - i'm openai"
84
+ }
85
+ ]
86
+ )
87
+ ```
88
+ ## Support & Talk to Founders
89
+
90
+ - [Schedule Demo πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
91
+ - [Community Discord πŸ’­](https://discord.gg/wuPM9dRgDw)
92
+ - Our numbers πŸ“ž +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
93
+ - Our emails βœ‰οΈ [email protected] / [email protected]
docs/my-website/docs/observability/supabase_integration.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Supabase Tutorial
2
+ [Supabase](https://supabase.com/) is an open source Firebase alternative.
3
+ Start your project with a Postgres database, Authentication, instant APIs, Edge Functions, Realtime subscriptions, Storage, and Vector embeddings.
4
+
5
+ ## Use Supabase to log requests and see total spend across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM)
6
+ liteLLM provides `success_callbacks` and `failure_callbacks`, making it easy for you to send data to a particular provider depending on the status of your responses.
7
+
8
+ In this case, we want to log requests to Supabase in both scenarios - when it succeeds and fails.
9
+
10
+ ### Create a supabase table
11
+
12
+ Go to your Supabase project > go to the [Supabase SQL Editor](https://supabase.com/dashboard/projects) and create a new table with this configuration.
13
+
14
+ Note: You can change the table name. Just don't change the column names.
15
+
16
+ ```sql
17
+ create table
18
+ public.request_logs (
19
+ id bigint generated by default as identity,
20
+ created_at timestamp with time zone null default now(),
21
+ model text null default ''::text,
22
+ messages json null default '{}'::json,
23
+ response json null default '{}'::json,
24
+ end_user text null default ''::text,
25
+ status text null default ''::text,
26
+ error json null default '{}'::json,
27
+ response_time real null default '0'::real,
28
+ total_cost real null,
29
+ additional_details json null default '{}'::json,
30
+ litellm_call_id text unique,
31
+ primary key (id)
32
+ ) tablespace pg_default;
33
+ ```
34
+
35
+ ### Use Callbacks
36
+ Use just 2 lines of code, to instantly see costs and log your responses **across all providers** with Supabase:
37
+
38
+ ```python
39
+ litellm.success_callback=["supabase"]
40
+ litellm.failure_callback=["supabase"]
41
+ ```
42
+
43
+ Complete code
44
+ ```python
45
+ from litellm import completion
46
+
47
+ ## set env variables
48
+ ### SUPABASE
49
+ os.environ["SUPABASE_URL"] = "your-supabase-url"
50
+ os.environ["SUPABASE_KEY"] = "your-supabase-key"
51
+
52
+ ## LLM API KEY
53
+ os.environ["OPENAI_API_KEY"] = ""
54
+
55
+ # set callbacks
56
+ litellm.success_callback=["supabase"]
57
+ litellm.failure_callback=["supabase"]
58
+
59
+ # openai call
60
+ response = completion(
61
+ model="gpt-3.5-turbo",
62
+ messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}],
63
+ user="ishaan22" # identify users
64
+ )
65
+
66
+ # bad call, expect this call to fail and get logged
67
+ response = completion(
68
+ model="chatgpt-test",
69
+ messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm a bad call to test error logging"}]
70
+ )
71
+
72
+ ```
73
+
74
+ ### Additional Controls
75
+
76
+ **Identify end-user**
77
+
78
+ Pass `user` to `litellm.completion` to map your llm call to an end-user
79
+
80
+ ```python
81
+ response = completion(
82
+ model="gpt-3.5-turbo",
83
+ messages=[{"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}],
84
+ user="ishaan22" # identify users
85
+ )
86
+ ```
87
+
88
+ **Different Table name**
89
+
90
+ If you modified your table name, here's how to pass the new name.
91
+
92
+ ```python
93
+ litellm.modify_integration("supabase",{"table_name": "litellm_logs"})
94
+ ```
95
+
96
+ ## Support & Talk to Founders
97
+
98
+ - [Schedule Demo πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
99
+ - [Community Discord πŸ’­](https://discord.gg/wuPM9dRgDw)
100
+ - Our numbers πŸ“ž +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
101
+ - Our emails βœ‰οΈ [email protected] / [email protected]
docs/my-website/docs/observability/telemetry.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Telemetry
2
+
3
+ LiteLLM contains a telemetry feature that tells us what models are used, and what errors are hit.
4
+
5
+ ## What is logged?
6
+
7
+ Only the model name and exception raised is logged.
8
+
9
+ ## Why?
10
+ We use this information to help us understand how LiteLLM is used, and improve stability.
11
+
12
+ ## Opting out
13
+ If you prefer to opt out of telemetry, you can do this by setting `litellm.telemetry = False`.
docs/my-website/docs/observability/traceloop_integration.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Image from '@theme/IdealImage';
2
+
3
+ # Traceloop (OpenLLMetry) - Tracing LLMs with OpenTelemetry
4
+
5
+ [Traceloop](https://traceloop.com) is a platform for monitoring and debugging the quality of your LLM outputs.
6
+ It provides you with a way to track the performance of your LLM application; rollout changes with confidence; and debug issues in production.
7
+ It is based on [OpenTelemetry](https://opentelemetry.io), so it can provide full visibility to your LLM requests, as well vector DB usage, and other infra in your stack.
8
+
9
+ <Image img={require('../../img/traceloop_dash.png')} />
10
+
11
+ ## Getting Started
12
+
13
+ Install the Traceloop SDK:
14
+
15
+ ```
16
+ pip install traceloop-sdk
17
+ ```
18
+
19
+ Use just 2 lines of code, to instantly log your LLM responses with OpenTelemetry:
20
+
21
+ ```python
22
+ Traceloop.init(app_name=<YOUR APP NAME>, disable_batch=True)
23
+ litellm.success_callback = ["traceloop"]
24
+ ```
25
+
26
+ To get better visualizations on how your code behaves, you may want to annotate specific parts of your LLM chain. See [Traceloop docs on decorators](https://traceloop.com/docs/python-sdk/decorators) for more information.
27
+
28
+ ## Exporting traces to other systems (e.g. Datadog, New Relic, and others)
29
+
30
+ Since Traceloop SDK uses OpenTelemetry to send data, you can easily export your traces to other systems, such as Datadog, New Relic, and others. See [Traceloop docs on exporters](https://traceloop.com/docs/python-sdk/exporters) for more information.
31
+
32
+ ## Support
33
+
34
+ For any question or issue with integration you can reach out to the Traceloop team on [Slack](https://join.slack.com/t/traceloopcommunity/shared_invite/zt-1plpfpm6r-zOHKI028VkpcWdobX65C~g) or via [email](mailto:[email protected]).
docs/my-website/docs/observability/wandb_integration.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import Image from '@theme/IdealImage';
2
+
3
+ # Weights & Biases - Logging LLM Input/Output
4
+ Weights & Biases helps AI developers build better models faster https://wandb.ai
5
+
6
+ <Image img={require('../../img/wandb.png')} />
7
+
8
+ :::info
9
+ We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
10
+ join our [discord](https://discord.gg/wuPM9dRgDw)
11
+ :::
12
+
13
+ ## Pre-Requisites
14
+ Ensure you have run `pip install wandb` for this integration
15
+ ```shell
16
+ pip install wandb litellm
17
+ ```
18
+
19
+ ## Quick Start
20
+ Use just 2 lines of code, to instantly log your responses **across all providers** with Weights & Biases
21
+
22
+ ```python
23
+ litellm.success_callback = ["wandb"]
24
+ ```
25
+ ```python
26
+ # pip install wandb
27
+ import litellm
28
+ import os
29
+
30
+ os.environ["WANDB_API_KEY"] = ""
31
+ # LLM API Keys
32
+ os.environ['OPENAI_API_KEY']=""
33
+
34
+ # set wandb as a callback, litellm will send the data to Weights & Biases
35
+ litellm.success_callback = ["wandb"]
36
+
37
+ # openai call
38
+ response = litellm.completion(
39
+ model="gpt-3.5-turbo",
40
+ messages=[
41
+ {"role": "user", "content": "Hi πŸ‘‹ - i'm openai"}
42
+ ]
43
+ )
44
+ ```
45
+
46
+ ## Support & Talk to Founders
47
+
48
+ - [Schedule Demo πŸ‘‹](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
49
+ - [Community Discord πŸ’­](https://discord.gg/wuPM9dRgDw)
50
+ - Our numbers πŸ“ž +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
51
+ - Our emails βœ‰οΈ [email protected] / [email protected]
docs/my-website/docs/projects.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Projects Built on LiteLLM
2
+
3
+
4
+
5
+ ### EntoAI
6
+ Chat and Ask on your own data.
7
+ [Github](https://github.com/akshata29/entaoai)
8
+
9
+ ### GPT-Migrate
10
+ Easily migrate your codebase from one framework or language to another.
11
+ [Github](https://github.com/0xpayne/gpt-migrate)
12
+
13
+ ### Otter
14
+ Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
15
+ [Github](https://github.com/Luodian/Otter)
16
+
17
+
18
+
19
+