Spaces:

leonard-dls
/

benchmark_data_contamination

Running

leonard-dls commited on 3 days ago

Commit

d126096

1 Parent(s): 6136624

change model order

Files changed (1) hide show

app.py CHANGED Viewed

@@ -11,8 +11,8 @@ with open("phi4_gsm8k_output.jsonl", "r") as file:
     phi4_dict = [json.loads(line) for line in file]
 models_data = {
     "Qwen/Qwen2.5-14B" : qwen_dict,
-    "microsoft/phi-4" : phi4_dict
 }
 starting_index = 0
@@ -26,8 +26,8 @@ This space aims to partially reproduce this work.
 I chose to look at the contamination of **Qwen/Qwen2.5-14B** and **microsoft/phi-4** by **GSM8K** dataset.
-For **Qwen/Qwen2.5-14B** I found **729** GSM8K examples that had a least a 0.9 text similarity ratio between generated and original.
 For **microsoft/phi-4** I found **172** GSM8K examples that had a least a 0.9 text similarity ratio between generated and original.
 """

     phi4_dict = [json.loads(line) for line in file]
 models_data = {
+    "microsoft/phi-4" : phi4_dict,
     "Qwen/Qwen2.5-14B" : qwen_dict,
 }
 starting_index = 0
 I chose to look at the contamination of **Qwen/Qwen2.5-14B** and **microsoft/phi-4** by **GSM8K** dataset.
 For **microsoft/phi-4** I found **172** GSM8K examples that had a least a 0.9 text similarity ratio between generated and original.
+For **Qwen/Qwen2.5-14B** I found **729** GSM8K examples that had a least a 0.9 text similarity ratio between generated and original.
 """