David Reber
PhD student, AI Safety researcher, effective altruist
David Reber
PhD student, AI Safety researcher, effective altruist
davidpreber [at] gmail [dot] com
Computer Science
University of Chicago
I'm investigating post-hoc, internal interpretability of large language models from a causal inference perspective. Specifically, what does it mean to locate a human-understandable concept, and how can we validate such claims?
I'm particularly motivated by applications to AI safety such as monitoring long-term planning and deception, but am also excited about applications to fairness and adversarial robustness.
Publications
Stability of Stochastically Switched and Stochastically Time-Delayed Systems
Camille Carter, Jacob Murri, David Reber, Benjamin Webb
arXiv, 2020
David Reber, Benjamin Webb
Nonlinearity, issue 33, 2020, p. 2660
Dynamical Stability despite Time-Varying Network Structure
David Reber, Benjamin Webb
MAA Intermountain Section Conference, SIAM Discrete Mathematics Conference, SIAM Annual Meeting, 2018