Data augmentation, wherein users apply a set of predefined transformation functions to enlarge a training dataset, can significantly improve model performance – although we don’t really understand why. Wu et al. study the effect of label invariant transformations (e.g. rotation, horizontal flip), label mixing transformations (e.g. mix-up), and compositions of label-invariant transformations (e.g. random cropping and rotating followed by horizontal flipping) on a simple, over-parameterized model. They find that label invariant transformations add new information to a model, thereby reducing its bias; while label mixing transformations provide a regularization effect (without adding new information), thus reducing the model’s variance. Finally, they use these theoretical insights to propose an augmentation approach that prioritizes transformed data points that the model is maximally uncertain about, and demonstrate their approach’s SoTA accuracy on canonical text and image datasets.