At Pruna AI, we are huge fans of the FLUX.1[dev] model from the incredible team at Black Forest Lab.
It’s the highest ranked open-weights model at the Artificial Analysis text-to-image leaderboard.
With over 8,500 likes, it’s also the most popular model on Hugging Face.
However, due to its 12B parameters, generating a single high-resolution image can take up to 30s on modern hardware - too slow for many applications. Fortunately, there are ways to reduce inference time. In this post, we compare three such approaches: torch.compile, TensorRT, and our own Pruna optimization engine. The summary of this comparison is presented in the table below.
bertrand_charp a day ago |
At Pruna AI, we are huge fans of the FLUX.1[dev] model from the incredible team at Black Forest Lab.
It’s the highest ranked open-weights model at the Artificial Analysis text-to-image leaderboard.
With over 8,500 likes, it’s also the most popular model on Hugging Face.
However, due to its 12B parameters, generating a single high-resolution image can take up to 30s on modern hardware - too slow for many applications. Fortunately, there are ways to reduce inference time. In this post, we compare three such approaches: torch.compile, TensorRT, and our own Pruna optimization engine. The summary of this comparison is presented in the table below.