Flabubium

Mechanistic Interpretability

We want to understand what models are actually doing when they produce outputs. This is early work, and most questions do not have clear answers yet.

Activation Patching

Tracing how information flows through a model by swapping activations at specific sites.

Representation Analysis

Studying how individual neurons simultaneously respond to many different, unrelated concepts.

Mechanistic Causality

Finding the small subnetworks inside large models that are responsible for specific capabilities.