sorry-bench
/

ft-mistral-7b-instruct-v0.2-sorry-bench-202406

@@ -14,11 +14,73 @@ tags:
 - benchmark
 ---
-# 🤖 Automated Safety Refusal Evaluator for SORRY-Bench (2024/06)
-📑[**SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors**](https://sorry-bench.github.io)
-🏠[**[Website]**](https://sorry-bench.github.io) &nbsp;&nbsp;&nbsp; 📚[**[Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-202406) &nbsp;&nbsp;&nbsp; 💻[**[Github]**](https://github.com/SORRY-Bench/SORRY-Bench) &nbsp;&nbsp;&nbsp; 🧑‍⚖️[**[Human Judgment Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406)
 This model is our adopted automated safety refusal evaluator for [SORRY-Bench](https://sorry-bench.github.io).
 We obtained this safety judge model by fine-tuning [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on our collected [human judge dataset](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406).

 - benchmark
 ---
+<style>
+  button {
+    /* margin: calc(20vw / 100); */
+    margin: 0.5em;
+    padding-left:  calc(40vw / 100);
+    padding-right:  calc(40vw / 100);
+    padding-bottom: calc(0vw / 100);
+    text-align: center;
+    font-size: 12px;
+    height: 25px;
+    /* padding-left:  calc(40vw / 100);
+    padding-right:  calc(40vw / 100);
+    padding-bottom: calc(0vw / 100);
+    text-align: center;
+    font-size: calc(60vw / 100);
+    height: calc(120vw / 100); */
+    transition: 0.5s;
+    background-size: 200% auto;
+    color: white;
+    border-radius: calc(60vw / 100);
+    display: inline;
+    /* border: 2px solid black; */
+    font-weight: 500;
+    box-shadow: 0px 0px 14px -7px #f09819;
+    background-image: linear-gradient(45deg, #FF512F 0%, #F09819 51%, #FF512F 100%);
+    cursor: pointer;
+    user-select: none;
+    -webkit-user-select: none;
+    touch-action: manipulation;
+  }
+  button:hover {
+    background-position: right center;
+    /* change the direction of the change here */
+    color: #fff;
+    text-decoration: none;
+  }
+  button:active {
+    transform: scale(0.95);
+  }
+</style>
+# 🤖 Automated Safety Refusal Evaluator for SORRY-Bench (2024/06)
+<!-- 📑[**SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors**](https://sorry-bench.github.io) -->
+<a href="https://sorry-bench.github.io" style="text-decoration:none">
+  <button>🏠Website </button>
+</a>
+<a href="http://arxiv.org/abs/2406.14598" style="text-decoration:none">
+  <button>📑Paper </button>
+</a>
+<a href="https://huggingface.co/datasets/sorry-bench/sorry-bench-202406" style="text-decoration:none">
+  <button>📚Dataset </button>
+</a>
+<a href="https://github.com/SORRY-Bench/SORRY-Bench" style="text-decoration:none">
+  <button>💻Github </button>
+</a>
+<a href="https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406" style="text-decoration:none">
+  <button>🧑‍⚖️Human Judgment Dataset </button>
+</a>
+<a href="https://huggingface.co/sorry-bench/ft-mistral-7b-instruct-v0.2-sorry-bench-202406" style="text-decoration:none">
+  <button>🤖Judge LLM </button>
+</a>
+<!-- 🏠[**[Website]**](https://sorry-bench.github.io) &nbsp;&nbsp;&nbsp; 📚[**[Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-202406) &nbsp;&nbsp;&nbsp; 💻[**[Github]**](https://github.com/SORRY-Bench/SORRY-Bench) &nbsp;&nbsp;&nbsp; 🧑‍⚖️[**[Human Judgment Dataset]**](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406) -->
 This model is our adopted automated safety refusal evaluator for [SORRY-Bench](https://sorry-bench.github.io).
 We obtained this safety judge model by fine-tuning [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on our collected [human judge dataset](https://huggingface.co/datasets/sorry-bench/sorry-bench-human-judgment-202406).