Batch normalization (BatchNorm) enables model developers to accelerate CNN training by using higher learning rates. However, model developers must make several decisions when implementing BatchNorm that can significantly impact model performance. Unlike other DL operators, BatchNorm operates on batches of data. Consequently, its output may depend on how samples are grouped into batches. In this context, Yuxin Wu and Justin Johnson study how different choices for the batch (in both training and inference) impact model performance. They show how common implementations of BatchNorm may degrade model performance, whereas more exotic approaches can yield significant performance benefits.