The AI Efficiency Tool

Make your AI models

Pruna is a frictionless solution to help you optimize and compress your ML models for efficient inference

Thanks for joining the waitlist of the Pruna Engine.
Oops! Something went wrong.
Stable Diffusion 2.1
300% faster with Pruna AI
270+ publications in machine learning

Efficient ML made frictionless

Only a few lines of code to automatically adapt and combine the best
machine learning efficiency and compression methods for your use-case.

Adapts to your ML tasks

Make your pipelines efficient by taking care of all tasks involved, whether in GenAI, LLMs, Computer Vision, NLP, Graphs & more

Adapts to model architectures

Keep the freedom to try new models and customize your model architecture for your needs, Pruna takes care of the rest

Adapts to your hardware

Find the best compute provider for your needs and budget, then squeeze out as much efficiency as you can by leveraging Pruna

Adapts to your workflows

Create customised efficiency configs based on your needs, save and load the efficient models easily and don't worry about compatibility

Astrid Eckert / TUM
Photo of Prof. Stephan Gunnemann

"As billions are invested in AI development, it is imperative to maximize the efficiency and impact of these resources."

Prof. Stephan Günnemann
Cofounder at Pruna AI
Professor of Data Analytics and Machine Learning at the Technical University of Munich

Frequently Asked Questions.

How does Pruna make models more efficient?

Our product adapts and combines the best efficiency methods for each use-case. This can include quantization, pruning, compilation and other algorithmic optimizations from the latest research and our own work. You can see the details in our documentation and each Hugging Face model's README.

How big are the improvements?

We showcase detailed results for specific models, hardwares and parameters in our list of models on Hugging Face. It's often 2-10x gains, sometimes more and sometimes less. Exact results will depend on your own pipelines, the best is to request a trial.

Does the model run on my side or Pruna side?

Your side. Pruna is a tool to make your models more efficient for your infrastructure, whether that's on a cloud provider you selected (AWS, Google cloud...), on your own cluster or in an edge device.

Does the model quality change?

It depends on the specific configs selected for our product. Some configs do not change quality, while others can slightly vary the output (usually to make the model even faster and smaller). Choose what suits you best or let our product do it for you. We put a lot of work to have the product adapt efficiency methods in a way that minimizes their combined impact on model output.

How much does it cost?

You can use the efficient models we put on Hugging Face for free (if you respect the original model's license). These are optimized for inference for specific but popular use-cases. If you want the same for other custom models and use-cases, you will need access to our product. Pricing varies and is meant to be win-win, so you get more than you pay for.

Is this for training or for inference?

Our current product makes your AI models more efficient at inference. Use it after training your models and before deploying them on your target hardware. Our next product iteration will make your model training more efficient too and we're eager for people to try it :)

How do you smash AI models?

Our approach integrates a suite of cutting-edge AI model compression techniques. These methods are the culmination of our years of research and numerous presentations at ML conferences including NeurIPS, ICML, and ICLR.

What do you need to smash my AI model?

Our product only needs your AI model and specifications about your target hardware for inference. The smashed models could be less flexible if you have very specific use-case, and that can be worked out with a little support.

Are there any risks?

We aim to maintain the predictive performance of all smashed'AI models, ensuring they're as accurate as their original versions. However, we must clarify that while the practical results have consistently met our goals, we cannot provide a theoretical guarantee of exact match in predictions with the original model. We recommend you test the smashed models on your own internal benchmarks.

Stop wasting compute & money

Tell us about your use-case, measure what Pruna can do for you and focus on what you do best.