Tools
We are developing a suite of open-source tools for technical AI safety research, focusing on mechanistic interpretability and automated capability evaluation.
InterpSuite (Coming Soon)
A unified framework for activation patching and circuit discovery across transformer architectures.
EvalRig
Standardized environments for testing dangerous capabilities in code generation and autonomous operation.