Real-world applications of machine learning often fail when distribution shifts happen. However, most datasets used in ML research do not represent distribution shifts or include artificial shifts that are unlikely to occur in production environments. In response, researchers from Stanford, Berkeley, Caltech, Cornell, and Microsoft Research have released WILDS; a benchmark of in-the-wild distribution shifts that cause the performance of baseline models to degrade. The WILDS benchmark includes 7 datasets spanning different data modalities (text, images) and use cases (e.g. sentiment analysis, tumor identification); as well as evaluators to automate calculate metrics reported on the WILD leaderboard.