juewang commited on
Commit
0a90abc
2 Parent(s): cf6ad2b 89a7e75

Merge branch 'main' of https://huggingface.co./togethercomputer/Llama-2-7B-32KCtx-v0.1

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -1,3 +1,36 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - togethercomputer/RedPajama-Data-1T
5
+ - togethercomputer/RedPajama-Data-Instruct
6
+ - EleutherAI/pile
7
+ language:
8
+ - en
9
+ library_name: transformers
10
  ---
11
+
12
+ # Llama-2-7B-32KCtx
13
+
14
+
15
+ # Install Flash Attention For Inference with 32K
16
+
17
+ ```
18
+ export CUDA_HOME=/usr/local/cuda-11.8
19
+ pip install ninja
20
+ pip install flash-attn --no-build-isolation
21
+ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
22
+ ```
23
+ Please revise the path of `CUDA_HOME`. `ninja` is needed to accelerate the process of compiling.
24
+
25
+ And then:
26
+ ```python
27
+ model = AutoModelForCausalLM.from_pretrained('togethercomputer/Llama-2-7B-32KCtx-v0.1', trust_remote_code=True, torch_dtype=torch.float16)
28
+ ```
29
+
30
+ You can also use vanilla `transformers` to load this model:
31
+ ```python
32
+ model = AutoModelForCausalLM.from_pretrained('togethercomputer/Llama-2-7B-32KCtx-v0.1', torch_dtype=torch.float16)
33
+ ```
34
+
35
+
36
+ TODO