Flabubium
01

Mechanistic Interpretability

We want to understand what models are actually doing when they produce outputs. This is early work, and most questions do not have clear answers yet.

Activation Patching

Attribution Patching

Tracing how information flows through a model by swapping activations at specific sites.

Representation Analysis

Towards Polysemanticity

Studying how individual neurons simultaneously respond to many different, unrelated concepts.

Mechanistic Causality

Circuit Tracing

Finding the small subnetworks inside large models that are responsible for specific capabilities.