Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
mattmdjagaΒ 
posted an update Oct 15
Post
1589
🚨 New Agent Benchmark 🚨
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)
In this post