Transformers
English
Inference Endpoints
shreyans92dhankhar commited on
Commit
06d8f4b
1 Parent(s): 783fe9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -1
README.md CHANGED
@@ -19,7 +19,7 @@ Intruction tuned model using FlanT5-XXL on data generated via ChatGPT for genera
19
 
20
  <!-- Provide a longer summary of what this model is/does. -->
21
 
22
- - **Developed by:** Jaykumar Kasundra, Shreyans Dhankhar(@shreyans92dhankhar)
23
  - **Model type:** Language model
24
  - **Language(s) (NLP):** en
25
  - **License:** other
@@ -29,6 +29,49 @@ Intruction tuned model using FlanT5-XXL on data generated via ChatGPT for genera
29
 
30
  # Uses
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
33
 
34
  ## Direct Use
 
19
 
20
  <!-- Provide a longer summary of what this model is/does. -->
21
 
22
+ - **Developed by:** Jaykumar Kasundra, Shreyans Dhankhar
23
  - **Model type:** Language model
24
  - **Language(s) (NLP):** en
25
  - **License:** other
 
29
 
30
  # Uses
31
 
32
+
33
+ </details>
34
+
35
+ ### Running the model on a GPU using different precisions
36
+
37
+ #### FP16
38
+
39
+ <details>
40
+ <summary> Click to expand </summary>
41
+
42
+ ```python
43
+ # pip install accelerate peft bitsandbytes
44
+ import torch
45
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
46
+ from peft import PeftModel,PeftConfig
47
+ tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
48
+ model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", torch_dtype=torch.float16)
49
+ input_text = "translate English to German: How old are you?"
50
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
51
+ outputs = model.generate(input_ids)
52
+ print(tokenizer.decode(outputs[0]))
53
+ ```
54
+
55
+ </details>
56
+
57
+ #### INT8
58
+
59
+ <details>
60
+ <summary> Click to expand </summary>
61
+
62
+ ```python
63
+ # pip install bitsandbytes accelerate
64
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
65
+ tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
66
+ model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", load_in_8bit=True)
67
+ input_text = "translate English to German: How old are you?"
68
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
69
+ outputs = model.generate(input_ids)
70
+ print(tokenizer.decode(outputs[0]))
71
+ ```
72
+
73
+ </details>
74
+
75
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
76
 
77
  ## Direct Use