PRIME-RL
/

Eurus-2-7B-SFT

Safetensors

qwen2

Model card Files Files and versions Community

hanbin commited on 6 days ago

Commit

a6d3144

verified ·

1 Parent(s): bd7430e

Update README.md

Browse files

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -38,6 +38,52 @@ We apply tailored prompts for coding and math task:
 {question} + "\n\nPresent the answer in LaTex format: \\boxed{Your answer}"
 ```
 ## Evaluation
 After finetuning, the performance of our Eurus-2-7B-SFT is shown in the following figure.

 {question} + "\n\nPresent the answer in LaTex format: \\boxed{Your answer}"
 ```
+```python
+import os
+from tqdm import tqdm
+import torch
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+os.environ["NCCL_IGNORE_DISABLED_P2P"] = "1"
+os.environ["TOKENIZERS_PARALLELISM"] = "true"
+def generate(question_list,model_path):
+    llm = LLM(
+        model=model_path,
+        trust_remote_code=True,
+        tensor_parallel_size=torch.cuda.device_count(),
+        gpu_memory_utilization=0.90,
+    )
+    sampling_params = SamplingParams(max_tokens=8192,
+                                    temperature=0.0,
+                                    n=1)
+    outputs = llm.generate(question_list, sampling_params, use_tqdm=True)
+    completions = [[output.text for output in output_item.outputs] for output_item in outputs]
+    return completions
+def make_conv_hf(question, tokenizer):
+    # for math problem
+    content = question + "\n\nPresent the answer in LaTex format: \\boxed{Your answer}"
+    # for code problem
+    # content = question + "\n\nWrite Python code to solve the problem. Present the code in \n```python\nYour code\n```\nat the end."
+    msg = [
+        {"role": "user", "content": content}
+    ]
+    chat = tokenizer.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
+    return chat
+def run():
+    model_path = "PRIME-RL/Eurus-2-7B-SFT"
+    all_problems = [
+        "which number is larger? 9.11 or 9.9?"
+    ]
+    tokenizer = AutoTokenizer.from_pretrained(model_path)
+    completions = generate([make_conv_hf(problem_data, tokenizer) for problem_data in all_problems],model_path)
+    print(completions)
+    # [['[ASSESS]\n\n# The task is to compare two decimal numbers, 9.11 and 9.9, to determine which one is larger.\n# The numbers are in a standard decimal format, making direct comparison possible.\n# No additional information or context is provided that could affect the comparison.\n\nNext action: [ADVANCE]\n\n[ADVANCE]\n\n# To compare the two numbers, I will examine their whole and decimal parts separately.\n# The whole part of both numbers is 9, so I will focus on the decimal parts.\n# The decimal part of 9.11 is 0.11, and the decimal part of 9.9 is 0.9.\n# Since 0.9 is greater than 0.11, I can conclude that 9.9 is larger than 9.11.\n\nNext action: [VERIFY]\n\n[VERIFY]\n\n# I will review my comparison of the decimal parts to ensure accuracy.\n# Upon re-examination, I confirm that 0.9 is indeed greater than 0.11.\n# I also consider the possibility of a mistake in my initial assessment, but the comparison seems straightforward.\n# I evaluate my process and conclude that it is sound, as I correctly identified the whole and decimal parts of the numbers and compared them accurately.\n# No potential errors or inconsistencies are found in my reasoning.\n\nNext action: [OUTPUT]\n\n[OUTPUT]\n\nTo determine which number is larger, 9.11 or 9.9, I compared their whole and decimal parts. Since the whole parts are equal, I focused on the decimal parts, finding that 0.9 is greater than 0.11. After verifying my comparison, I concluded that 9.9 is indeed larger than 9.11.\n\n\\boxed{9.9}\n\n']]
+if __name__ == "__main__":
+    run()
+```
 ## Evaluation
 After finetuning, the performance of our Eurus-2-7B-SFT is shown in the following figure.