Many overparameterized neural networks achieve high accuracy on average but low accuracy on specific subgroups. Previous literature has shown that distributionally robust optimization (DRO) can produce models that minimize the worst-case loss over predefined subgroups. However, Sagawa et al. study DRO in the context of overparameterized neural networks applied to natural language inference, facial attribute recognition, and bird photograph recognition. They find that DRO models that attain vanishing training loss do not outperform standard ERM models – while these DRO models may have high average and worst-group training accuracies, they have low worst-group test accuracies because they do not generalize well for the worst group. The authors propose pairing DRO with various forms of increased regularization and find that they achieve a 10-40 point improvement in worst-group accuracies on several tasks.