Compatibility Layer

Compatible With Any AI Development Workflow With Our Integrations

Compatible With Any AI Development Workflow With Our Integrations

Get the same expertise as an in-house optimization team without blocking your team’s productivity. Integrated into your ML pipeline.

Get the same expertise as an in-house optimization team without blocking your team’s productivity. Integrated into your ML pipeline.

pip install pruna

Copied

pip install pruna

Copied

pip install pruna

Copied

Deploy Pruna Your Way,
Self-Hosted or in the Cloud

Get a faster inference without the trial-and-error process.

Choose the flexibility that suits you: Pruna can be self-hosted, launched directly from the AWS Marketplace, or self-hosted with Docker or deployed via Koyeb, Replicate and more.

Self Hosted

Self Hosted

Self Hosted

Docker-Based

Docker-Based

Docker-Based

Hardware-Agnostic

Hardware-Agnostic

Hardware-Agnostic

EC2

EC2

EC2

Lambda

Lambda

Lambda

SageMaker

SageMaker

SageMaker

Replicate

Replicate

Replicate

Koyeb

Koyeb

Koyeb

Modal

Modal

Modal

TritonServer

TritonServer

TritonServer

vLLM

vLLM

vLLM

ComfyUI

ComfyUI

ComfyUI

Hot Swapping of LoRAs.

No compilation warmups

Hot Swapping of LoRAs.

No compilation warmups

LoRAs are an extremely convenient tool for improving your model’s capacities. However, swapping LoRAs can trigger a new compilation time. Pruna ensures that Diffusers-driven LoRAs are efficient and don’t cause recompilation warmups.

Pruna nodes built for ComfyUI

Get a faster inference without the trial-and-error process.

Our solution includes a compilation Node for optimized execution and three Caching Nodes to reuse the computation. Pruna Note is the fastest solution for diffusion models.

Full vLLM Compatibility (Coming Soon)

Get a faster inference without the trial-and-error process.

We're actively working to bring full vLLM compatibility to Pruna. You can already load Pruna-optimized models using supported quantizers like AutoAWQ, BitsAndBytes, GPTQ, and TorchAO.

Our team is working on improving compatibility between Pruna and VLLM.

Curious how? Let’s chat.

Speed Up Your Models With Pruna AI

Inefficient models drive up costs, slow down your productivity and increase carbon emissions. Make your AI more accessible and sustainable with Pruna AI.

pip install pruna

Copied

Speed Up Your Models With Pruna AI

Inefficient models drive up costs, slow down your productivity and increase carbon emissions. Make your AI more accessible and sustainable with
Pruna AI.

pip install pruna

Copied

Speed Up Your Models With Pruna AI

Inefficient models drive up costs, slow down your productivity and increase carbon emissions. Make your AI more accessible and sustainable with Pruna AI.

pip install pruna

Copied