While Transformer models can achieve SOTA performance on several tasks, they are typically memory and computationally inefficient and may not perform well on long input sequences. To address these shortcomings, Wu et al. presented Fast Transformer, which uses an additive attention mechanism to handle longer sequences with linear complexity. Here, Rishit Dagli has OSS’ed an implementation of Fast Transformer in Tensorflow.