Technical Articles, Integration

Deep Dive Into Flux: Everything You Need to Know!

Nov 27, 2024

Johanna Sommer

ML Researcher

Quentin Sinig

Go-To-Market Lead

Over the past few weeks, we've been receiving 3 to 5 weekly requests—from companies curious about our public quantized versions of Flux. After discovering that these optimized models could fit on A10G GPU, they came to us with detailed benchmarking questions: How much faster is it? Can it handle any pixel size or batch of images without quality loss? Is it compatible across various hardware setups? The sudden surge in interest genuinely caught us off guard. So, naturally, we decided to dive deeper and get to the bottom of what’s driving this excitement.

Why We Believe Flux Is More Than Just Hype

The Flux model (and all its variations) is making headlines in the world of image generation, and for good reason. With a whopping 12 billion parameters under its hood, Flux is described as a “rectified flow transformer.” But what does that mean in simpler terms?

Let’s start with the concept of a “rectified flow.” Flux operates similarly to diffusion models, which generate high-quality images from random noise. Flux however, which is based on the concept of “Flow Matching”, takes a more direct and efficient route than traditional diffusion models. While those older models might meander through complex, winding paths to create an image, Flux takes the straight road. It moves directly from noise to image along a straight path in its internal representation space. Because Flux doesn’t take unnecessary detours, it can produce high-quality images with fewer steps, meaning much faster image generation. Imagine getting from point A to point B without taking the scenic routes—you arrive much quicker!

Comment: We can see that diffusion models only start producing higher-quality images in the later steps, whereas Flow Matching achieves this much earlier." Source: https://arxiv.org/pdf/2210.02747

Now, let’s talk about the “transformer” part of Flux. This refers to the type of neural network architecture it uses to go from point A to point B. Despite handling a massive number of parameters, the transformer blocks design helps Flux stay efficient. These layers allow Flux to manage complex spatial relationships within images more effectively, enhancing both speed and quality.

When we talk efficiency, the Flux developers did not stop there. Good job, Black Forrest Labs! In addition to the FLUX.1 [dev] version, we have access to FLUX.1 [schnell] , which was trained with latent adversarial diffusion distillation, meaning it can generate high-quality images in only 1 to 4 steps.

As an ML Researcher who worked on Flow Matching models during my PhD, I've made my choice, and Flux is a resounding Yes! However, the community's opinions do vary. If you browse the some discussions on Reddit (like this one), you'll notice that some researchers are still debating whether Flow Matching truly outperforms traditional diffusion models in terms of innovation, speed, and quality. It's a hot topic, and the differing viewpoints make for an interesting read!

Introducing the “Flux Playground”

With all the buzz happening, the questions we've received, and the benchmarks we've conducted, we decided to build a mini app to showcase our findings. Brace yourself: yes, it’s a test URL, and yes, the design is simple—but the value it delivers made us go public with it. The app compares the Schnell and Dev base models with optimized versions like Turbo and Fast, using over 60 prompts. We included speed-up and cost metrics, plus direct image comparisons to assess image quality. By the way, since quality evaluation can be subjective, we've integrated a feature for customers who want to dive deeper, enabling real-time comparison votes from a panel of real people.

Getting Started With Flux and Pruna AI

4 steps, 28 lines of code. That's all it takes—and lucky for you, we love sharing code snippets to make your life easier (this one compiles but here a “quantization-tutorial”)! Just a heads-up: you'll need a token (see line 16: # replace <your_token> with your actual token or None if you do not have one). But no worries—just drop your email, and you'll automatically receive one. Yup, that's a new feature, and we're teasing it a bit here! 😉

import torch
from diffusers import FluxPipeline
from pruna import SmashConfig, smash

# Load the Flux model
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.to('cuda')

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['compilers'] = ['onediff']

# Smash the model
pipe.transformer = smash(
    model=pipe.transformer,
    token='<your_token>',  # replace <your_token> with your actual token or None if you do not have one
    smash_config=smash_config,
)

# Run the model on a given input
prompt = "A cat holding a sign that says hello world"
pipe(
    prompt,
    guidance_scale=0.0,
    num_inference_steps=4,
    max_sequence_length=256,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

As we wrap up, let’s take a step back and look at how theory meets reality. We recently published a case study with Each AI, and we think it perfectly illustrates how the technical considerations we've discussed translate into impactful, real-life use cases. Check out the full blog post to learn how Each AI went from zero to production in just days, achieving incredible x3 cost and speed optimizations. If you're inspired to get started yourself, here’s the Getting Started Documentation to make it work for you too!

Or you can simply stop by the Discord, say Hi, and discuss tech with our team! We host office hours every Tuesday from 1:30-2.30pm CEST :)

Wanna Go Deeper?

At Pruna AI, our team of Researchers and PhDs shares a passion for deep scientific exploration, and we love providing additional resources for those who share the same DNA of curiosity. For anyone eager to dive deeper into the science behind Flow Matching and the innovative techniques used in Flux, we’ve curated a list of recommendations.

Button

Button

Johanna Sommer

&

Quentin Sinig

Nov 27, 2024

Johanna Sommer

&

Quentin Sinig

Nov 27, 2024

Johanna Sommer

&

Quentin Sinig

Nov 27, 2024

Speed Up Your Models With Pruna

Inefficient models drive up costs, slow down your productivity and increase carbon emissions. Optimize with Pruna. Make your AI more accessible and sustainable.

Speed Up Your Models With Pruna

Inefficient models drive up costs, slow down your productivity and increase carbon emissions. Optimize with Pruna. Make your AI more accessible and sustainable.

Speed Up Your Models With Pruna

Inefficient models drive up costs, slow down your productivity and increase carbon emissions. Optimize with Pruna. Make your AI more accessible and sustainable.

© 2024 Pruna AI - Built with Pretzels & Croissants 🥨 🥐

© 2024 Pruna AI - Built with Pretzels & Croissants

© 2024 Pruna AI - Built with Pretzels & Croissants 🥨 🥐