Nearly every issue of Projects to Know has featured at least one paper or project related to Transformer models, which outperform other architectures on several tasks, including automated speech recognition. Transformer models are typically trained on unlabelled data, fine-tuned on labeled data, and applied to test data. However, domain mismatch can occur when these datasets (train, fine-tune, test) are collected through different processes. Here, Hsu et al. find that adding unlabeled data whose domain matches the test data almost always improves the performance of the wave2vec 2.0 model (which consists of a convolutional feature encoder that represents audio as latent speech representations, which are input to a Transformer). These results may benefit practitioners, who often have limited access to labeled data for fine-tuning.