Training-Set Influence Analysis and Estimation

Zayd Hammoudeh
Date and time: 
Fri, Oct 14 2022 - 7:00am
Zayd Hammoudeh
University of Oregon
  • Daniel Lowd (Chair)
  • Thien Nguyen
  • Humphrey Shi

Everything a machine learning model knows comes from the training data. However, for overparameterized, deep models, the causal relationship between training data and specific predictions is not well understood. Training-set influence partially demystifies these underlying interactions by quantifying the amount each training instance alters the final model. However, measuring influence exactly can be provably hard, leading to influence estimators being used in practice. This paper provides the first comprehensive survey of training-set influence and influence estimation. We formalize the various, and in places conflicting, definitions of influence. The state-of-the-art influence methods are then organized into a taxonomy, and we compare various approaches’ underlying assumptions, complexity, strengths, and limitations. We also propose future research directions to make influence estimation more useful to practitioners as well as more theoretically and empirically sound.