Update README.md
Browse files
README.md
CHANGED
@@ -67,10 +67,10 @@ Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) f
|
|
67 |
|
68 |
## Inference
|
69 |
|
70 |
-
You can use the Together API to try out Llama-2-7B-32K-beta for inference.
|
71 |
-
The updated inference stack allows for efficient
|
72 |
|
73 |
-
To run the model locally, we strongly recommend to install Flash Attention V2:
|
74 |
```
|
75 |
# Please update the path of `CUDA_HOME`
|
76 |
export CUDA_HOME=/usr/local/cuda-11.8
|
|
|
67 |
|
68 |
## Inference
|
69 |
|
70 |
+
You can use the [Together API](https://together.ai/blog/api-announcement) to try out Llama-2-7B-32K-beta for inference.
|
71 |
+
The updated inference stack allows for efficient inference.
|
72 |
|
73 |
+
To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
|
74 |
```
|
75 |
# Please update the path of `CUDA_HOME`
|
76 |
export CUDA_HOME=/usr/local/cuda-11.8
|