Researchers and industry practitioners remain captivated by Transformer models, often pretrained on billions of words, and their application to NLP tasks. However, few have studied how the amount of pretraining data impacts the capabilities of these models. To explore this topic, Zhang et al. adopt four probing methods (classifier probing, information-theoretic probing, unsupervised relative acceptability judgment, fine-tuning on NLU tasks) applied to MiniBERTas, pretrained on 1M, 10M, 100M, and 1B words. They find that although these models can learn syntactic and semantic features from pretraining sets of just 10-100M words, they require billion-word pretraining sets to acquire the common sense knowledge necessary to make more substantial performance improvements.