Model developers often use whitening – a linear transformation by which correlations between feature dimensions in a dataset are removed – as a data preprocessing step to accelerate convergence and capture contributions from low variance feature directions. However, Dyer et al. prove and verify experimentally that whitening may remove the only information truly useful for prediction on high-dimensional datasets. Therefore, models trained with whitened data often generalize poorly. Furthermore, pure second-order optimization techniques, which take optimization steps based on information about the curvature of the loss landscape, suffer the same problem as whitening. However, the authors find that well-regularized second-order optimization can offer a positive practical tradeoff, and even improve generalization in special circumstances.