Most ML researchers and practitioners believe that the performance of Transformer models is correlated with their size – larger models will yield better results. To study this phenomena, Google researchers trained a 540-billion parameter language model, PaLM, on 780 billion tokens of high-quality text using Pathways (an ML system designed to facilitate efficient pipeline-free training at scale with thousands of accelerator chips). Upon evaluating PaLM on hundreds of NLP, code, and mathematical reasoning tasks, they found that it achieves significantly better performance than prior language models and exhibits new capabilities like explicitly interpreting and explaining complex reasoning. What’s more, they observe discontinuous improvements when scaling from 8B to 62B to 540B parameters and, therefore, predict that bigger models may continue to exceed SOTA performance and demonstrate new abilities.