Popular language models that apply unsupervised pre-training encode knowledge implicitly in the model weights (i.e. they “memorize” information). Therefore, to capture additional knowledge, model developers must train larger networks – which can be memory and computationally expensive and inefficient. To make knowledge retrieval more efficient, Google AI researchers have released REALM, a new paradigm for language model pre-training which enables models to access knowledge explicitly from raw text documents. REALM combines a language representation model with a neural document retriever trained on a fill-in-the-blank objective to find relevant text. The repository contains code needed to perform the pre-training step of REALM. The maintainers plan to release the full pre-training corpus and retrieval corpus needed to pre-train REALM in the near future.