Hidden Markov models (HMM), which separate the hidden state from observations, are often used for sequence modeling. However, this separation makes it challenging to efficiently scale HMM for large language modeling datasets. To address this issue, Chiu and Rush propose three techniques to scale HMMs to tens of thousands of states without compromising efficient exact inference. These include a modeling constraint, neural parameterization to improve generalization, and a variant of dropout.