leonard-dls
commited on
Commit
·
5c4ad30
1
Parent(s):
7675e28
change description
Browse files
app.py
CHANGED
@@ -13,6 +13,8 @@ This Space is inspired by [Luis Hunt's](https://www.linkedin.com/posts/louiswhun
|
|
13 |
He highlights how current top performing models from major vendors are contaminated with benchmark data that is supposed to be used to assess their performance.
|
14 |
|
15 |
This space aims to partially reproduce this work. I chose to look at the contamination of **Qwen/Qwen2.5-14B** by **GSM8K** dataset.
|
|
|
|
|
16 |
"""
|
17 |
|
18 |
|
|
|
13 |
He highlights how current top performing models from major vendors are contaminated with benchmark data that is supposed to be used to assess their performance.
|
14 |
|
15 |
This space aims to partially reproduce this work. I chose to look at the contamination of **Qwen/Qwen2.5-14B** by **GSM8K** dataset.
|
16 |
+
|
17 |
+
I found **729** GSM8K Example that had a least a 0.9 text similarity ratio between generated an original.
|
18 |
"""
|
19 |
|
20 |
|