The Lottery Ticket Hypothesis tells us that only a subset of weights in a model determine its accuracy. However, most methods for finding these sparse representations entail first training a dense network and then pruning the learned model (dense-to-sparse), which is not computationally efficient. In this work, Jayakumar et al. introduce Top-KAST, a training method that reduces the computational cost of training by eliminating the need to do a forward pass with dense parameters and calculate a dense gradient.Top-KAST chooses the largest weights plus a few random ones during the forward and backward passes and applies auxiliary exploration lose to avoid fixating on a suboptimal subnetwork. The authors demonstrate its applications to language modeling and image recognition.