LLM-Tuning-Safety

university

https://llm-tuning-safety.github.io/

LLM-Tuning-Safety

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

vtu81 authored a paper 6 months ago

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

vtu81 authored a paper 6 months ago

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

vtu81 authored a paper 6 months ago

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

View all activity

models

None public yet

datasets 1

LLM-Tuning-Safety/HEx-PHI

Preview • Updated Aug 19 • 124 • 34