Update README.md
Browse files
README.md
CHANGED
@@ -14,11 +14,73 @@ tags:
|
|
14 |
- benchmark
|
15 |
---
|
16 |
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
This model is our adopted automated safety refusal evaluator for [SORRY-Bench](https://sorry-bench.github.io).
|
24 |
We obtained this safety judge model by fine-tuning [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on our collected [human judge dataset](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406).
|
|
|
14 |
- benchmark
|
15 |
---
|
16 |
|
17 |
+
<style>
|
18 |
+
button {
|
19 |
+
/* margin: calc(20vw / 100); */
|
20 |
+
margin: 0.5em;
|
21 |
+
padding-left: calc(40vw / 100);
|
22 |
+
padding-right: calc(40vw / 100);
|
23 |
+
padding-bottom: calc(0vw / 100);
|
24 |
+
text-align: center;
|
25 |
+
font-size: 12px;
|
26 |
+
height: 25px;
|
27 |
+
/* padding-left: calc(40vw / 100);
|
28 |
+
padding-right: calc(40vw / 100);
|
29 |
+
padding-bottom: calc(0vw / 100);
|
30 |
+
text-align: center;
|
31 |
+
font-size: calc(60vw / 100);
|
32 |
+
height: calc(120vw / 100); */
|
33 |
+
transition: 0.5s;
|
34 |
+
background-size: 200% auto;
|
35 |
+
color: white;
|
36 |
+
border-radius: calc(60vw / 100);
|
37 |
+
display: inline;
|
38 |
+
/* border: 2px solid black; */
|
39 |
+
font-weight: 500;
|
40 |
+
box-shadow: 0px 0px 14px -7px #f09819;
|
41 |
+
background-image: linear-gradient(45deg, #FF512F 0%, #F09819 51%, #FF512F 100%);
|
42 |
+
cursor: pointer;
|
43 |
+
user-select: none;
|
44 |
+
-webkit-user-select: none;
|
45 |
+
touch-action: manipulation;
|
46 |
+
}
|
47 |
+
|
48 |
+
button:hover {
|
49 |
+
background-position: right center;
|
50 |
+
/* change the direction of the change here */
|
51 |
+
color: #fff;
|
52 |
+
text-decoration: none;
|
53 |
+
}
|
54 |
+
|
55 |
+
button:active {
|
56 |
+
transform: scale(0.95);
|
57 |
+
}
|
58 |
+
</style>
|
59 |
|
60 |
+
# 🤖 Automated Safety Refusal Evaluator for SORRY-Bench (2024/06)
|
61 |
|
62 |
+
<!-- 📑[**SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors**](https://sorry-bench.github.io) -->
|
63 |
+
|
64 |
+
<a href="https://sorry-bench.github.io" style="text-decoration:none">
|
65 |
+
<button>🏠Website </button>
|
66 |
+
</a>
|
67 |
+
<a href="http://arxiv.org/abs/2406.14598" style="text-decoration:none">
|
68 |
+
<button>📑Paper </button>
|
69 |
+
</a>
|
70 |
+
<a href="https://huggingface.co/datasets/sorry-bench/sorry-bench-202406" style="text-decoration:none">
|
71 |
+
<button>📚Dataset </button>
|
72 |
+
</a>
|
73 |
+
<a href="https://github.com/SORRY-Bench/SORRY-Bench" style="text-decoration:none">
|
74 |
+
<button>💻Github </button>
|
75 |
+
</a>
|
76 |
+
<a href="https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406" style="text-decoration:none">
|
77 |
+
<button>🧑⚖️Human Judgment Dataset </button>
|
78 |
+
</a>
|
79 |
+
<a href="https://huggingface.co/sorry-bench/ft-mistral-7b-instruct-v0.2-sorry-bench-202406" style="text-decoration:none">
|
80 |
+
<button>🤖Judge LLM </button>
|
81 |
+
</a>
|
82 |
+
|
83 |
+
<!-- 🏠[**[Website]**](https://sorry-bench.github.io) 📚[**[Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-202406) 💻[**[Github]**](https://github.com/SORRY-Bench/SORRY-Bench) 🧑⚖️[**[Human Judgment Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406) -->
|
84 |
|
85 |
This model is our adopted automated safety refusal evaluator for [SORRY-Bench](https://sorry-bench.github.io).
|
86 |
We obtained this safety judge model by fine-tuning [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on our collected [human judge dataset](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406).
|