luyaojie commited on
Commit
df99f27
·
verified ·
1 Parent(s): 6582be5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md CHANGED
@@ -44,6 +44,92 @@ We present a fine-tuned model for formal verification tasks. It is fine-tuned in
44
  <img width=60%" src="figures/data-prepare.png">
45
  </p>
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
48
  ## Citation
49
  ```
 
44
  <img width=60%" src="figures/data-prepare.png">
45
  </p>
46
 
47
+ ## Quickstart
48
+ Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to inference fmbench.
49
+
50
+ ``` python
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+
53
+ instruct = """
54
+ Translate the given requirement using TLA's syntax and semantics.
55
+ You only need to return the TLA formal specification without explanation.
56
+ """
57
+
58
+ input_text = """
59
+ An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
60
+ - The control state `octl[p]` is equal to `\"done\"`.
61
+ - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
62
+ - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
63
+ - The variables `omem` and `obuf` remain unchanged.
64
+ """
65
+
66
+ model_name = "fm-universe/deepseek-coder-7b-instruct-v1.5-fma"
67
+
68
+ model = AutoModelForCausalLM.from_pretrained(
69
+ model_name, torch_dtype="auto", device_map="auto"
70
+ )
71
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
72
+
73
+ messages = [{"role": "user", "content": f"{instruct}\n{input_text}"}]
74
+
75
+ text = tokenizer.apply_chat_template(
76
+ messages, tokenize=False, add_generation_prompt=True
77
+ )
78
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
79
+
80
+ generated_ids = model.generate(**model_inputs, max_new_tokens=4096)
81
+ generated_ids = [
82
+ output_ids[len(input_ids) :]
83
+ for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
84
+ ]
85
+
86
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
87
+ print(response)
88
+ ```
89
+
90
+ ## Example of Offline Inference
91
+
92
+ vLLM supports offline inference.
93
+
94
+ ``` python
95
+ from vllm import LLM, SamplingParams
96
+
97
+ instruct = """
98
+ Translate the given requirement using TLA's syntax and semantics.
99
+ You only need to return the TLA formal specification without explanation.
100
+ """
101
+
102
+ input_text = """
103
+ An operation `LM_Inner_Rsp(p)` that represents a response process for a given parameter `p`. It satisfies the following conditions:
104
+ - The control state `octl[p]` is equal to `\"done\"`.
105
+ - The `Reply(p, obuf[p], memInt, memInt')` operation is executed.
106
+ - The control state `octl` is updated by setting the `p` index of `octl` to `\"rdy\"`.
107
+ - The variables `omem` and `obuf` remain unchanged.
108
+ """
109
+
110
+ model_name = "fm-universe/deepseek-coder-7b-instruct-v1.5-fma"
111
+
112
+ # Pass the default decoding hyperparameters
113
+ # max_tokens is for the maximum length for generation.
114
+ greed_sampling = SamplingParams(temperature=0, max_tokens=4096)
115
+
116
+ # load the model
117
+ llm = LLM(
118
+ model=model_name,
119
+ tensor_parallel_size=1,
120
+ max_model_len=4096,
121
+ enable_chunked_prefill=True,
122
+ # quantization="fp8", # Enabling FP8 quantization for model weights can reduce memory usage.
123
+ )
124
+
125
+ # Prepare chat messages
126
+ chat_message = [{"role": "user", "content": f"{instruct}\n{input_text}"}]
127
+
128
+ # Inference
129
+ responses = llm.chat(chat_message, greed_sampling, use_tqdm=True)
130
+
131
+ print(responses[0].outputs[0].text)
132
+ ```
133
 
134
  ## Citation
135
  ```