NebulaSense
/

ContractAssist

Inference Endpoints

Model card Files Files and versions Community

shreyans92dhankhar commited on Oct 3, 2023

Commit

06d8f4b

•

1 Parent(s): 783fe9d

Update README.md

Files changed (1) hide show

README.md +44 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Intruction tuned model using FlanT5-XXL on data generated via ChatGPT for genera
 <!-- Provide a longer summary of what this model is/does. -->
-- **Developed by:** Jaykumar Kasundra, Shreyans Dhankhar(@shreyans92dhankhar)
 - **Model type:** Language model
 - **Language(s) (NLP):** en
 - **License:** other
@@ -29,6 +29,49 @@ Intruction tuned model using FlanT5-XXL on data generated via ChatGPT for genera
 # Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ## Direct Use

 <!-- Provide a longer summary of what this model is/does. -->
+- **Developed by:** Jaykumar Kasundra, Shreyans Dhankhar
 - **Model type:** Language model
 - **Language(s) (NLP):** en
 - **License:** other
 # Uses
+</details>
+### Running the model on a GPU using different precisions
+#### FP16
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install accelerate peft bitsandbytes
+import torch
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+from peft import PeftModel,PeftConfig
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", torch_dtype=torch.float16)
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+#### INT8
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install bitsandbytes accelerate
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto", load_in_8bit=True)
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ## Direct Use