Flux-Kontext-Juiced: State-of-the-art Image Editing 5x Faster! - Pruna AI - Make your AI models cheaper, faster, smaller ...

Back to articles

Announcement

・

Technical Article

Flux-Kontext-Juiced: State-of-the-art Image Editing 5x Faster!

Jun 26, 2026

Nils Fleischmann

ML Research Engineer

John Rachwan

Cofounder & CTO

Gaspar Rochette

ML Research Engineer

Bertrand Charpentier

Cofounder, President & Chief Scientist

David Berenstein

ML & DevRel

Simon Langrieger

ML Research Intern

Flux-Kontext is now open-source! It allows for performing image-to-image generation with state-of-the-art quality :) However, it takes 14.4 seconds for each generation on one H100. When we learned about this, we were in our offsite to chill together, but thought it would still be cool to make Flux-Kontext faster.

This is no problem! We compressed it, deployed it, evaluated it in less than <4h, and enjoyed all the offsite! Now, Flux-Kontext is x5 faster with Pruna AI :)

The Productivity Gains: Compressed and Deployed in <4h!

To compress the model, we simply used Pruna Pro and combined caching, factorization, quantization, and compilation algorithms. More specifically, we used these compression algorithms:

In practice, you will be able to reproduce this compression with the following configuration, on the upcoming Flux-Kontext integration into Hugging Face:

from pruna import SmashConfig

smash_config = SmashConfig()
smash_config["quantizer"] = "fp8"
smash_config["compiler"] = "torch_compile"
smash_config["torch_compile_target"] = "module_list"
smash_config["cacher"] = "auto"
smash_config["objective"] = "quality"
smash_config["auto_cache_mode"] = "bdf"
smash_config["auto_speed_factor"] = 0.4

We used three variations of this configuration to create Lightly-Juiced, Juiced, and Ultra-Juiced Flux-Kontext versions. After compression, the model can be deployed in multiple places, for example, our Replicate endpoint. In total, this took less than 4 hours to do, while enjoying the Pruna offsite!

🫟 Let’s make our team look like knitted prunes!

The Efficiency Gains: Editing Image 5x Faster!

With these compression configurations, we reach up to 5x speedups over 30 steps! In more detail, for each image, we reach

Base: ~14.4s for 1 megapixel image.
Lightly-Juiced: 4.5s for 1 megapixel image. 3.2x speedups!
Juiced: 3.7s for a 1 megapixel image. 3.9x speedups!!
Ultra-Juiced: 2.9s for a 1 megapixel image. 4.9x speedups!!!

This translates to:

Better user experience by reducing generation waiting time,
Money savings when scaling deployment to many users,
Less energy is consumed by reducing GPU utilization time.

Check the video yourself to enjoy the speed!

The Generated Images: Unchanged Quality and High Fidelity!

Of course, we like to generate images to check the quality (and have fun!).. Here are some samples, but you can find more side-by-side comparisons on this page.

As you can see, the images generated with the base Flux-Kontext are very close to the images generated with Flux-Kontext juiced versions, and the quality remains unchanged!

Enjoy the speed!

You would like to go further?

Try our Replicate endpoint in 1 click.
Compress you own models with Pruna package.
Learn about the latest AI efficiency research with our blogs, materials collection, & courses.