Although model developers often focus on feature engineering and/or model selection, rigorous and intentional data collection can sometimes have an even greater positive impact on model performance. Here, Butrovich et al. describe TScout framework, which they designed to collect training data for behavior models that power self-driving database management systems. Previously, those developing self-driving DBMS (which automate tuning and optimization tasks) collected data offline by cloning the database and simulating the application through a workload trace or by using hand-written runners to execute queries. In contrast, TScout augments offline data with data collected online, including hardware-level performance counters, kernel-level observations, and application-level counters – all while the DBMS is executing the workload. By integrating TScout with the NoisePage DBMS, the authors reduced the error for NoisePage by 98%.