Transformer models (with self-attentive architectures) are clearly replacing CNNs as the model architecture du jour. However, in this paper, Gupta et al. explore if CNN-based pre-trained models can outperform Transformers. They find through extensive experimentation on 8 datasets and tasks that, in some cases, pre-trained convolutions can match the performance of Transformers, including NLP tasks like toxicity detection, sentiment classification, news classification, etc. In other cases, pre-trained convolutional Seq2Seq models can outperform Transformers. The authors contend that the benefits of pre-training should be studied independently of architectural advances.