Speeding up FLUX.1[dev]: A Comparison between torch.compile, TensorRT, and Pruna

At Pruna AI, we are huge fans of the FLUX.1[dev] model from the incredible team at Black Forest Lab.

It’s the highest ranked open-weights model at the Artificial Analysis text-to-image leaderboard.

With over 8,500 likes, it’s also the most popular model on Hugging Face.

However, due to its 12B parameters, generating a single high-resolution image can take up to 30s on modern hardware - too slow for many applications. Fortunately, there are ways to reduce inference time. In this post, we compare three such approaches: torch.compile, TensorRT, and our own Pruna optimization engine. The summary of this comparison is presented in the table below.