Skip to content

AMindToThink/sae_jailbreak_unlearning

Repository files navigation

About

Investigating how well intervening on Sparse Autoencoder internals prevents adversaries from accessing dangerous knowledge.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors