As interest in Transformer models escalates, more and more researchers are exploring strategies to make these architectures computationally efficient. For example, Tao Lei presents SRU++, which combines fast recurrence and attention-based sequence modeling to achieve near SOTA performance with improved training efficiency. The results of this research suggest that by minimizing attention and applying fast recurrence, researchers can accelerate model training and inference.