Recently developed text-based, supervised language models that demonstrate few- and one-shot learning capabilities can adapt their knowledge to new information. Hill et al. extend these models to a 3D physical domain, where embodied agents navigate a simulated environment and achieve similar one-shot word learning abilities through reinforcement learning algorithms. By leveraging a new form of explicit external memory (based on the “dual-coding” theory of representation) that writes separate visual and linguistic embeddings for observations, they enable an agent with both slowly acquired word meanings and “fast-mapping” – the ability to re-identify and manipulate an object after a single exposure. They also find that dual-coding can help intrinsically motivate agents to learn names for objects that can be useful for executing downstream tasks.