arxiv:2501.18837
Hoagy Cunningham
HoagyC
AI & ML interests
None yet
Recent Activity
authored
a paper
1 day ago
Constitutional Classifiers: Defending against Universal Jailbreaks
across Thousands of Hours of Red Teaming
authored
a paper
over 1 year ago
Sparse Autoencoders Find Highly Interpretable Features in Language
Models
Organizations
None yet