Safetensors
qwen2
hanbin commited on
Commit
a6d3144
·
verified ·
1 Parent(s): bd7430e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -38,6 +38,52 @@ We apply tailored prompts for coding and math task:
38
  {question} + "\n\nPresent the answer in LaTex format: \\boxed{Your answer}"
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ## Evaluation
42
 
43
  After finetuning, the performance of our Eurus-2-7B-SFT is shown in the following figure.
 
38
  {question} + "\n\nPresent the answer in LaTex format: \\boxed{Your answer}"
39
  ```
40
 
41
+ ```python
42
+ import os
43
+ from tqdm import tqdm
44
+ import torch
45
+ from transformers import AutoTokenizer
46
+ from vllm import LLM, SamplingParams
47
+ os.environ["NCCL_IGNORE_DISABLED_P2P"] = "1"
48
+ os.environ["TOKENIZERS_PARALLELISM"] = "true"
49
+ def generate(question_list,model_path):
50
+ llm = LLM(
51
+ model=model_path,
52
+ trust_remote_code=True,
53
+ tensor_parallel_size=torch.cuda.device_count(),
54
+ gpu_memory_utilization=0.90,
55
+ )
56
+ sampling_params = SamplingParams(max_tokens=8192,
57
+ temperature=0.0,
58
+ n=1)
59
+ outputs = llm.generate(question_list, sampling_params, use_tqdm=True)
60
+ completions = [[output.text for output in output_item.outputs] for output_item in outputs]
61
+ return completions
62
+ def make_conv_hf(question, tokenizer):
63
+ # for math problem
64
+ content = question + "\n\nPresent the answer in LaTex format: \\boxed{Your answer}"
65
+ # for code problem
66
+ # content = question + "\n\nWrite Python code to solve the problem. Present the code in \n```python\nYour code\n```\nat the end."
67
+ msg = [
68
+ {"role": "user", "content": content}
69
+ ]
70
+ chat = tokenizer.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
71
+ return chat
72
+
73
+ def run():
74
+ model_path = "PRIME-RL/Eurus-2-7B-SFT"
75
+ all_problems = [
76
+ "which number is larger? 9.11 or 9.9?"
77
+ ]
78
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
79
+ completions = generate([make_conv_hf(problem_data, tokenizer) for problem_data in all_problems],model_path)
80
+ print(completions)
81
+ # [['[ASSESS]\n\n# The task is to compare two decimal numbers, 9.11 and 9.9, to determine which one is larger.\n# The numbers are in a standard decimal format, making direct comparison possible.\n# No additional information or context is provided that could affect the comparison.\n\nNext action: [ADVANCE]\n\n[ADVANCE]\n\n# To compare the two numbers, I will examine their whole and decimal parts separately.\n# The whole part of both numbers is 9, so I will focus on the decimal parts.\n# The decimal part of 9.11 is 0.11, and the decimal part of 9.9 is 0.9.\n# Since 0.9 is greater than 0.11, I can conclude that 9.9 is larger than 9.11.\n\nNext action: [VERIFY]\n\n[VERIFY]\n\n# I will review my comparison of the decimal parts to ensure accuracy.\n# Upon re-examination, I confirm that 0.9 is indeed greater than 0.11.\n# I also consider the possibility of a mistake in my initial assessment, but the comparison seems straightforward.\n# I evaluate my process and conclude that it is sound, as I correctly identified the whole and decimal parts of the numbers and compared them accurately.\n# No potential errors or inconsistencies are found in my reasoning.\n\nNext action: [OUTPUT]\n\n[OUTPUT]\n\nTo determine which number is larger, 9.11 or 9.9, I compared their whole and decimal parts. Since the whole parts are equal, I focused on the decimal parts, finding that 0.9 is greater than 0.11. After verifying my comparison, I concluded that 9.9 is indeed larger than 9.11.\n\n\\boxed{9.9}\n\n']]
82
+ if __name__ == "__main__":
83
+ run()
84
+ ```
85
+
86
+
87
  ## Evaluation
88
 
89
  After finetuning, the performance of our Eurus-2-7B-SFT is shown in the following figure.