About

I'm exploring the role of causality as a language for explainability and oversight of large generative models like ChatGPT. Advised by Victor Veitch.
I'm investigating post-hoc, internal interpretability of large language models from a causal inference perspective. Specifically, what does it mean to locate a human-understandable concept, and how can we validate such claims?
I'm particularly motivated by applications to AI safety such as monitoring long-term planning and deception, but am also excited about applications to fairness and adversarial robustness.

Publications

Stability of Stochastically Switched and Stochastically Time-Delayed Systems

Camille Carter, Jacob Murri, David Reber, Benjamin Webb

arXiv, 2020

Intrinsic stability: stability of dynamical networks and switched systems with any type of time-delays

David Reber, Benjamin Webb

Nonlinearity, issue 33, 2020, p. 2660

Dynamical Stability despite Time-Varying Network Structure

David Reber, Benjamin Webb

MAA Intermountain Section Conference, SIAM Discrete Mathematics Conference, SIAM Annual Meeting, 2018