Update README.md
Browse files
README.md
CHANGED
@@ -10,11 +10,11 @@ language:
|
|
10 |
library_name: transformers
|
11 |
---
|
12 |
|
13 |
-
#
|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
-
|
18 |
This model represents our efforts to contribute to the rapid progress of the open-source ecosystem for large language models.
|
19 |
The model has been extended to a context length of 32K with position interpolation,
|
20 |
allowing applications on multi-document QA, long text summarization, etc.
|
@@ -44,7 +44,7 @@ To enhance the long-context ability, we exclude data shorter than 2K word. The i
|
|
44 |
|
45 |
Next, we provide examples of how to fine-tune the model for specific applications.
|
46 |
The example datasets are placed in [togethercomputer/Long-Data-Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections)
|
47 |
-
You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over
|
48 |
Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
|
49 |
|
50 |
1. Long Context QA.
|
@@ -68,7 +68,7 @@ Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) f
|
|
68 |
|
69 |
## Inference
|
70 |
|
71 |
-
You can use the [Together API](https://together.ai/blog/api-announcement) to try out
|
72 |
The updated inference stack allows for efficient inference.
|
73 |
|
74 |
To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
|
@@ -87,8 +87,8 @@ You can use this model directly from the Hugging Face Model Hub or fine-tune it
|
|
87 |
```python
|
88 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
89 |
|
90 |
-
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/
|
91 |
-
model = AutoModelForCausalLM.from_pretrained("togethercomputer/
|
92 |
|
93 |
input_context = "Your text here"
|
94 |
input_ids = tokenizer.encode(input_context, return_tensors="pt")
|
@@ -102,7 +102,7 @@ Alternatively, you can set `trust_remote_code=False` if you prefer not to use fl
|
|
102 |
|
103 |
## Limitations and Bias
|
104 |
|
105 |
-
As with all language models,
|
106 |
|
107 |
## Community
|
108 |
|
|
|
10 |
library_name: transformers
|
11 |
---
|
12 |
|
13 |
+
# LLaMA-2-7B-32K
|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
+
LLaMA-2-7B-32K is an open-source, long context language model developed by Together, fine-tuned from Meta's original Llama-2 7B model.
|
18 |
This model represents our efforts to contribute to the rapid progress of the open-source ecosystem for large language models.
|
19 |
The model has been extended to a context length of 32K with position interpolation,
|
20 |
allowing applications on multi-document QA, long text summarization, etc.
|
|
|
44 |
|
45 |
Next, we provide examples of how to fine-tune the model for specific applications.
|
46 |
The example datasets are placed in [togethercomputer/Long-Data-Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections)
|
47 |
+
You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over LLaMA-2-7B-32K.
|
48 |
Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
|
49 |
|
50 |
1. Long Context QA.
|
|
|
68 |
|
69 |
## Inference
|
70 |
|
71 |
+
You can use the [Together API](https://together.ai/blog/api-announcement) to try out LLaMA-2-7B-32K for inference.
|
72 |
The updated inference stack allows for efficient inference.
|
73 |
|
74 |
To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
|
|
|
87 |
```python
|
88 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
89 |
|
90 |
+
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/LLaMA-2-7B-32K")
|
91 |
+
model = AutoModelForCausalLM.from_pretrained("togethercomputer/LLaMA-2-7B-32K", trust_remote_code=True, torch_dtype=torch.float16)
|
92 |
|
93 |
input_context = "Your text here"
|
94 |
input_ids = tokenizer.encode(input_context, return_tensors="pt")
|
|
|
102 |
|
103 |
## Limitations and Bias
|
104 |
|
105 |
+
As with all language models, LLaMA-2-7B-32K may generate incorrect or biased content. It's important to keep this in mind when using the model.
|
106 |
|
107 |
## Community
|
108 |
|