A common tradeoff developers must consider when training a reinforcement learning agent relates to the number of environments to use: creating more training environments is labor intensive (developers must hand-specify all degrees of variation), but using fewer training environments can make agents generalize poorly. To resolve this issue, Kumar et al. present a framework motivated by human behavior (where people can practice and remember diverse solutions to a task to achieve robustness). Unlike other approaches that try to achieve robustness by training a single policy across a distribution of environments, their framework encourages agents to learn a range of policies (or solutions) in a single environment. By optimizing for policy diversity, their approach yields more robustness towards perturbations in the environment and generalizes better to new environments.