Nearly every week, we highlight an interesting application of Transformer models, including tasks ranging from NLP to image classification to protein fold prediction. In most cases, a Transformer is trained (often in an unsupervised or weakly supervised manner) on one corpus and then fine-tuned on a dataset from the same modality for a downstream task. In this paper, Lu et al. propose Frozen Pretrained Transformer (FPT), a Transformer model pretrained on natural language that can generalize to other modalities (e.g. images, proteins, etc.) without expensive finetuning of self-attention layers. Upon finding that FPT can improve performance and computational efficiency on non-language tasks, they surmise that the self-attention layers learned by a language model could possibly enable efficient universal computation.