In reinforcement learning, rewards are maximized by putting the probability mass of a policy on the sequence of actions with the highest return. However, in some scenarios (often where exploration is important), users may want to select a set of sequences of actions (trajectories) – for instance, this context is common in the preclinical stages of drug development when researchers will screen many possible drug candidates. To do this, Bengio et al. propose transforming a positive reward or return function into a generative policy that samples with a probability proportional to the return. The authors present GFlowNet, a new generative model that treats the generative process as a flow network. They show how this approach generates a policy that samples from the desired distribution and validates GFlowNet on a large-scale molecule synthesis task.