vtu81 commited on
Commit
9f7a940
1 Parent(s): fb7bb31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -14,11 +14,73 @@ tags:
14
  - benchmark
15
  ---
16
 
17
- # 🤖 Automated Safety Refusal Evaluator for SORRY-Bench (2024/06)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- 📑[**SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors**](https://sorry-bench.github.io)
20
 
21
- 🏠[**[Website]**](https://sorry-bench.github.io)     📚[**[Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-202406)     💻[**[Github]**](https://github.com/SORRY-Bench/SORRY-Bench)     🧑‍⚖️[**[Human Judgment Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  This model is our adopted automated safety refusal evaluator for [SORRY-Bench](https://sorry-bench.github.io).
24
  We obtained this safety judge model by fine-tuning [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on our collected [human judge dataset](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406).
 
14
  - benchmark
15
  ---
16
 
17
+ <style>
18
+ button {
19
+ /* margin: calc(20vw / 100); */
20
+ margin: 0.5em;
21
+ padding-left: calc(40vw / 100);
22
+ padding-right: calc(40vw / 100);
23
+ padding-bottom: calc(0vw / 100);
24
+ text-align: center;
25
+ font-size: 12px;
26
+ height: 25px;
27
+ /* padding-left: calc(40vw / 100);
28
+ padding-right: calc(40vw / 100);
29
+ padding-bottom: calc(0vw / 100);
30
+ text-align: center;
31
+ font-size: calc(60vw / 100);
32
+ height: calc(120vw / 100); */
33
+ transition: 0.5s;
34
+ background-size: 200% auto;
35
+ color: white;
36
+ border-radius: calc(60vw / 100);
37
+ display: inline;
38
+ /* border: 2px solid black; */
39
+ font-weight: 500;
40
+ box-shadow: 0px 0px 14px -7px #f09819;
41
+ background-image: linear-gradient(45deg, #FF512F 0%, #F09819 51%, #FF512F 100%);
42
+ cursor: pointer;
43
+ user-select: none;
44
+ -webkit-user-select: none;
45
+ touch-action: manipulation;
46
+ }
47
+
48
+ button:hover {
49
+ background-position: right center;
50
+ /* change the direction of the change here */
51
+ color: #fff;
52
+ text-decoration: none;
53
+ }
54
+
55
+ button:active {
56
+ transform: scale(0.95);
57
+ }
58
+ </style>
59
 
60
+ # 🤖 Automated Safety Refusal Evaluator for SORRY-Bench (2024/06)
61
 
62
+ <!-- 📑[**SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors**](https://sorry-bench.github.io) -->
63
+
64
+ <a href="https://sorry-bench.github.io" style="text-decoration:none">
65
+ <button>🏠Website </button>
66
+ </a>
67
+ <a href="http://arxiv.org/abs/2406.14598" style="text-decoration:none">
68
+ <button>📑Paper </button>
69
+ </a>
70
+ <a href="https://huggingface.co/datasets/sorry-bench/sorry-bench-202406" style="text-decoration:none">
71
+ <button>📚Dataset </button>
72
+ </a>
73
+ <a href="https://github.com/SORRY-Bench/SORRY-Bench" style="text-decoration:none">
74
+ <button>💻Github </button>
75
+ </a>
76
+ <a href="https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406" style="text-decoration:none">
77
+ <button>🧑‍⚖️Human Judgment Dataset </button>
78
+ </a>
79
+ <a href="https://huggingface.co/sorry-bench/ft-mistral-7b-instruct-v0.2-sorry-bench-202406" style="text-decoration:none">
80
+ <button>🤖Judge LLM </button>
81
+ </a>
82
+
83
+ <!-- 🏠[**[Website]**](https://sorry-bench.github.io) &nbsp;&nbsp;&nbsp; 📚[**[Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-202406) &nbsp;&nbsp;&nbsp; 💻[**[Github]**](https://github.com/SORRY-Bench/SORRY-Bench) &nbsp;&nbsp;&nbsp; 🧑‍⚖️[**[Human Judgment Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406) -->
84
 
85
  This model is our adopted automated safety refusal evaluator for [SORRY-Bench](https://sorry-bench.github.io).
86
  We obtained this safety judge model by fine-tuning [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on our collected [human judge dataset](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406).