Tools

We are developing a suite of open-source tools for technical AI safety research, focusing on mechanistic interpretability and automated capability evaluation.

InterpSuite (Coming Soon)

A unified framework for activation patching and circuit discovery across transformer architectures.

EvalRig

Standardized environments for testing dangerous capabilities in code generation and autonomous operation.