Storage services in data centers make decisions – based on predictions about future workload and system behavior – to optimize cache hit rate, disk footprint, and other metrics. These decisions are often based on heuristics that reflect statistical workload properties (e.g., temporal or spatial locality). In contrast, Giulio Zhou and Martin Maas propose using multi-task machine learning techniques to extract data from unstructured distributed traces, which can then be used by storage services to leverage application-level information automatically.