SentAugment is a data augmentation technique for semi-supervised learning that can improve performance on multiple language understanding tasks. SentAugment makes it possible to leverage self-training or knowledge distillation by identifying domain-specific unannotated sentences using vector-based search techniques. Specifically, SentAugment embeds sentences (with an encoder based on the Transformer implementation of XLM) and then retrieves in-domain unannotated sentences from the large-scale sentence embedding space using ANN.