Stable Cascade: A New Image Generation Model

Cover Image for Stable Cascade: A New Image Generation Model
Stable Cascade
Stable Cascade

Stable Cascade is a text-to-image generation model developed by Stability AI, which is designed to produce realistic and diverse images from natural language prompts. It is based on a three-stage approach, comprising stages A, B, and C, that work together using the Würstchen architecture. The first stage, stage C, compresses the text prompt into a latent code, the second stage, stage A, decodes the latent code into a low-resolution image, and the third stage, stage B, refines the low-resolution image into a high-resolution image. This approach reduces the memory and computational requirements and speeds up the image creation time. Stable Cascade can be used for various applications such as content creation, design, education, and entertainment. It can generate images of fictional characters, landscapes, animals, logos, or anything else described with text. Additionally, it can be used to enhance or modify existing images, such as increasing their resolution, changing their style, adding or removing objects, or creating new images from their edges.

Stable Cascade is available on GitHub for research purposes only and is not for commercial use. Stability AI has provided a Colab notebook that demonstrates how to use Stable Cascade for various image generation tasks. The model is released under a non-commercial license that permits non-commercial use only. Stability AI also offers its models via API for commercial use, but Stable Cascade is not yet part of that offering.

The model is based on the "Würstchen" (Sausage) architecture introduced in January 2024. It is a three-stage diffusion-based text-image synthesis that learns a highly compressed but detailed semantic "image recipe" (Stage C) that drives the diffusion process (Stage B). According to Stability AI, this compact representation provides much more detailed guidance compared to latent language representations, reducing computational effort while improving image quality.

Stable Cascade can be adapted to specific needs by accessing the checkpoints, inference scripts, and fine-tuning scripts available on the Stability GitHub page. The model's modular approach allows for experimentation and customization to achieve desired results.

For more information and updates on Stable Cascade, you can follow Stability AI on their social media platforms such as Twitter, Instagram, LinkedIn, and join their Discord Community