8-bit quantization methods can make large pretrained language models more memory efficient; however, they degrade performance and require tuning. Moreover, quantization techniques that do not degrade performance have not been studied in models with more than 350M parameters. Here, Dettmers et al. present the first multi-billion scale, degradation-free quantization procedure for LLMs. Their procedure, LLM.int8() can convert the feed-forward and attention projection layers of a 175B parameter Transformer with 16 or 32-bit weights to 8-bit weights by leveraging vector-wise quantization and mixed-precision decomposition.