Speech recognition tasks are often very challenging because unlike text and images speech signals are continuous-valued sequences and the boundaries between sound units are not specified. Moreover, model developers must distinguish between speech and other ambient noises and cannot rely upon a prior lexicon of sound units. To address these challenges, Facebook AI has released HuBERT a new model for self-supervised speech representation learning. HuBERT alternates clustering and prediction steps; offline k-means clustering generates noisy labels from audio inputs for pre-training and a BERT model predicts the correct cluster for masked audio segments. The PyTorch repo now includes five pre-trained and/or fine-tuned models ranging from 95M – 1B parameters.