Case study

Efficient Development, Efficient Production: ROAST Deploys 4.5x Faster AI Models on Modal in 7 Hours with Pruna

Jul 21, 2025

Quentin Sinig

Quentin Sinig

Go-to-Market Lead

Bertrand Charpentier

Bertrand Charpentier

Cofounder, President & Chief Scientist

ROAST is an AI-powered service that helps people improve their dating profiles. It reviews your profile, suggests better answers to prompts, and even generates unique photos. It's on this last feature — image generation — where Pruna came in to help ROAST speed up the inference, create a better user experience, and cut down costs to scale when the traffic goes up.

Cutting Down Inference Time by 4.5x

Here’s how ROAST’s workflow works: users upload between 5 to 10 photos of themselves. From there, ROAST retrains their model on those inputs and initiate generation for 10 to 60 final pictures, depending on the product package selected by their user. They even offer a premium version with a human-in-the-loop for extra manual editing.

Technically speaking, the setup runs on Modal cloud using A100 hardware. To generate one image, Flux Dev runs 30 to 50 inference steps, depending on the case, and they dynamically swap between LoRA adapters to adapt for each user.

By integrating Pruna, they ended up with a 4.5x speed-up going from 1.77 step/s to 5.48 step/s. Even cold starts were improved, with warm-up time now around 4 seconds, in case the instance had gone cold due to traffic dips. While the it/s speed-up reached 4.5x, the full inference pipeline saw a 2.5x to 3x improvement since it includes CPU-side processing specifically designed to minimize GPU uptime.

The first test was conducted on 20 steps, nice results, right?

This means cost savings from reduced GPU uptime, along with a significant boost to user experience by generating images faster, ultimately driving additional revenue through better conversion and improved retention.

From “Heard of Pruna” to “In Production” in 7 Hours

The timeline was as fast as the inference. Wednesday, 2nd April 2025:

  • 🕘 Morning: Benoit hears about Pruna

  • 🕓 4:00 pm: We chat for 15 minutes

  • 🧵 5:13 pm: Slack channel opened

  • 💳 5:27 pm: ROAST purchases Pruna Pro

  • 🛠️ 5:38 pm: We send the recommended config

  • ✅ 11:59 pm: First production test complete

Before midnight, the first test was complete, and the results were already conclusive.

“First test, I saved 20–30 min over 70 min, no quality loss. So very good.”

— Benoit Baylin, Co-founder @ ROAST

Bonus: Want to Try This on Modal?

We obviously won’t share ROAST’s exact stack, but since Modal is becoming a popular choice among our users, we’re putting together a short guide to show you how to install and run Pruna there.

In this guide, we will optimize Flux dev with Pruna Pro for NVIDIA H100 GPUs. Head over to our GitHub (https://github.com/PrunaAI/modal-example) to see some examples, like this Flux Dev optimization on modal.

It’s a Wrap

Now you’ve got everything you need to get started, including the exact config we recommended! The only thing missing is a Pruna Pro token. If you're ready to give it a go, head over to our pricing page to get one. Starting at just $0.40/hour with no minimum commitment, it’s one of the cheapest upgrades you can make to your stack.

Once you’ve signed up, you’ll automatically be offered a 1-hour technical onboarding call to help you get set up, just like we did with ROAST. No pressure. But if you’re curious, we’re here!

Subscribe to Pruna's Newsletter

Curious what Pruna can do for your models?

Whether you're running GenAI in production or exploring what's possible, Pruna makes it easier to move fast and stay efficient.

Curious what Pruna can do for your models?

Whether you're running GenAI in production or exploring what's possible, Pruna makes it easier to move fast and stay efficient.

Curious what Pruna can do for your models?

Whether you're running GenAI in production or exploring what's possible, Pruna makes it easier to move fast and stay efficient.