Although reinforcement learning systems can outperform humans in games like chess and Atari, they do not learn as efficiently or generalize as robustly. To develop RL agents that can, for example, learn to play a game within minutes, Tsividis et al. seek to imitate how humans learn – including through strong inductive biases and an intrinsic desire to understand causal relations. They present a new “theory-based RL” approach, which they implement in a new architecture – Exploring, Modeling Planning Agent (EMPA). EMPA applies Bayesian inference to learn probabilistic generative models and nearly matches human efficiency on a set of Atari-style games.