Diffusion models generate images through processes that perturb input data and then iteratively denoise input images using DNN. Although DM achieves SOTA results on image synthesis tasks, they can be prohibitively expensive to train and evaluate because they work sequentially on the whole image. In contrast, latent diffusion models use a compressed image representation to reduce training and inference costs. Stable Diffusion is a latent diffusion model developed and OSS’ed by researchers at LMU Munich and Amplify portfolio company, RunwayML that generates and modifies images based on text prompts. Although the model was trained on Stability AI’s 4,000 A100 Eztra-1 AI ultracluster, it can run on under 10GB of VRAM on consumer GPUs and generate images at 512×512 pixels in seconds.