togethercomputer
/

LLaMA-2-7B-32K

@@ -10,11 +10,11 @@ language:
 library_name: transformers
 ---
-# Llama-2-7B-32K-beta
 ## Model Description
-Llama-2-7B-32K-beta is an open-source, long context language model developed by Together, fine-tuned from Meta's original Llama-2 7B model.
 This model represents our efforts to contribute to the rapid progress of the open-source ecosystem for large language models.
 The model has been extended to a context length of 32K with position interpolation,
 allowing applications on multi-document QA, long text summarization, etc.
@@ -44,7 +44,7 @@ To enhance the long-context ability, we exclude data shorter than 2K word. The i
 Next, we provide examples of how to fine-tune the model for specific applications.
 The example datasets are placed in [togethercomputer/Long-Data-Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections)
-You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over Llama-2-7B-32K-beta.
 Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
 1. Long Context QA.
@@ -68,7 +68,7 @@ Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) f
 ## Inference
-You can use the [Together API](https://together.ai/blog/api-announcement) to try out Llama-2-7B-32K-beta for inference.
 The updated inference stack allows for efficient inference.
 To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
@@ -87,8 +87,8 @@ You can use this model directly from the Hugging Face Model Hub or fine-tune it
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-beta")
-model = AutoModelForCausalLM.from_pretrained("togethercomputer/Llama-2-7B-32K-beta", trust_remote_code=True, torch_dtype=torch.float16)
 input_context = "Your text here"
 input_ids = tokenizer.encode(input_context, return_tensors="pt")
@@ -102,7 +102,7 @@ Alternatively, you can set `trust_remote_code=False` if you prefer not to use fl
 ## Limitations and Bias
-As with all language models, Llama-2-7B-32K-beta may generate incorrect or biased content. It's important to keep this in mind when using the model.
 ## Community

 library_name: transformers
 ---
+# LLaMA-2-7B-32K
 ## Model Description
+LLaMA-2-7B-32K is an open-source, long context language model developed by Together, fine-tuned from Meta's original Llama-2 7B model.
 This model represents our efforts to contribute to the rapid progress of the open-source ecosystem for large language models.
 The model has been extended to a context length of 32K with position interpolation,
 allowing applications on multi-document QA, long text summarization, etc.
 Next, we provide examples of how to fine-tune the model for specific applications.
 The example datasets are placed in [togethercomputer/Long-Data-Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections)
+You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over LLaMA-2-7B-32K.
 Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
 1. Long Context QA.
 ## Inference
+You can use the [Together API](https://together.ai/blog/api-announcement) to try out LLaMA-2-7B-32K for inference.
 The updated inference stack allows for efficient inference.
 To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("togethercomputer/LLaMA-2-7B-32K")
+model = AutoModelForCausalLM.from_pretrained("togethercomputer/LLaMA-2-7B-32K", trust_remote_code=True, torch_dtype=torch.float16)
 input_context = "Your text here"
 input_ids = tokenizer.encode(input_context, return_tensors="pt")
 ## Limitations and Bias
+As with all language models, LLaMA-2-7B-32K may generate incorrect or biased content. It's important to keep this in mind when using the model.
 ## Community