Send us a message.

Tell us about the models you want smashed and get our experts input on how we can best help you with your use case.

Thank you!

Your message has been submitted.
We will get back to you within 24-48 hours.
Oops! Something went wrong.

Frequently Asked Questions.

How does Pruna make models more efficient?

Our product adapts and combines the best efficiency methods for each use-case. This can include quantization, pruning, compilation and other algorithmic optimizations from the latest research and our own work. You can see the details in our documentation and each Hugging Face model's README.

How big are the improvements?

We showcase detailed results for specific models, hardwares and parameters in our list of models on Hugging Face. It's often 2-10x gains, sometimes more and sometimes less. Exact results will depend on your own pipelines, the best is to request a trial.

Does the model run on my side or Pruna side?

Your side. Pruna is a tool to make your models more efficient for your infrastructure, whether that's on a cloud provider you selected (AWS, Google cloud...), on your own cluster or in an edge device.

Does the model quality change?

It depends on the specific configs selected for our product. Some configs do not change quality, while others can slightly vary the output (usually to make the model even faster and smaller). Choose what suits you best or let our product do it for you. We put a lot of work to have the product adapt efficiency methods in a way that minimizes their combined impact on model output.

How much does it cost?

You can use the efficient models we put on Hugging Face for free (if you respect the original model's license). These are optimized for inference for specific but popular use-cases. If you want the same for other custom models and use-cases, you will need access to our product. Pricing varies and is meant to be win-win, so you get more than you pay for.

Is this for training or for inference?

Our current product makes your AI models more efficient at inference. Use it after training your models and before deploying them on your target hardware. Our next product iteration will make your model training more efficient too and we're eager for people to try it :)

How do you smash AI models?

Our approach integrates a suite of cutting-edge AI model compression techniques. These methods are the culmination of our years of research and numerous presentations at ML conferences including NeurIPS, ICML, and ICLR.

What do you need to smash my AI model?

Our product only needs your AI model and specifications about your target hardware for inference. The smashed models could be less flexible if you have very specific use-case, and that can be worked out with a little support.

Are there any risks?

We aim to maintain the predictive performance of all smashed'AI models, ensuring they're as accurate as their original versions. However, we must clarify that while the practical results have consistently met our goals, we cannot provide a theoretical guarantee of exact match in predictions with the original model. We recommend you test the smashed models on your own internal benchmarks.