Understanding and Adapting Tree Ensembles: A Training Data Perspective

Jonathan Brophy
Date and time: 
Mon, Nov 21 2022 - 2:00pm
Jonathan Brophy
University of Oregon
  • Daniel Lowd (Chair)
  • Stephen Fickas
  • Thanh Nguyen
  • Ben Hutchinson (Psychology)

Despite the impressive success of deep learning models on unstructured data (e.g., images, audio, text), tree-based ensembles such as random forests and gradient-boosted trees are hugely popular and remain the preferred choice for tabular or structured data. Despite their impressive predictive performance, tree ensembles face significant challenges---namely, lack of explainable predictions, limited uncertainty estimation, and inefficient adaptability to changes in the training data---which may limit their further adoption to certain applications, especially for safety-critical or privacy-sensitive domains such as weather forecasting or predictive medical modeling. This dissertation investigates the aforementioned shortcomings currently facing tree ensembles, and posits that numerous improvements can be made by analyzing the relationships between the training data and the resulting learned model. By studying the effects of one or many training examples on tree ensembles, we develop solutions for these models which (1) increase their predictive explainability, (2) expand their predictive uncertainty estimation capabilities, and (3) enable their efficient adaptation to changes in the training data.