Open-source PixArt-δ image generator spits out high-resolution AI images in 0.5 seconds

Key Points:

  • PixArt-δ challenges Stable Diffusion with its faster, more accurate, and high-resolution image generation capabilities, outperforming its predecessor, PixArt-α, and presenting a significant leap in advancement.
  • The integration of the Latent Consistency Model (LCM) and ControlNet into PixArt-δ accelerates its inference speed, enabling the rapid generation of high-quality images in just half a second, outshining its competition.
  • PixArt-δ’s training efficiency on V100 GPUs and 8-bit inference capability substantially enhances its usability and accessibility, while the integration of the ControlNet module provides finer control over text-to-image diffusion models, setting a new standard for explicit controllability and high-quality image generation.

Summary:

Move over Stable Diffusion, because PixArt-δ is giving you a run for your money. This new open-source image generator is faster, more accurate, and churns out high-resolution images with breathtaking speed. Researchers from Huawei Noah’s Ark Lab, Dalian University of Technology, Tsinghua University, and Hugging Face have collaborated to present PixArt-δ, a robust text-to-image synthesis framework designed to rival the formidable Stable Diffusion family. This advanced model is a quantum leap beyond its predecessor, PixArt-α, renowned for its swift generation of 1024 x 1024 pixel images. The integration of the Latent Consistency Model (LCM) and ControlNet into PixArt-δ has propelled its inference speed, enabling it to produce exceptional, high-quality images in just half a second – a remarkable feat, given that the previous model took seven times longer.

 

PixArt-δ’s results are even more impressive when compared to its competition. Not only does it outshine SDXL Turbo in resolution and consistency, but it also exhibits a higher degree of accuracy and adherence to instructions. Its design allows for efficient training on V100 GPUs with 32GB of VRAM in less than a day, and its 8-bit inference capability even makes 1024-pixel image synthesis possible on 8GB GPUs, significantly enhancing its usability and accessibility. Moreover, the integration of the ControlNet module affords greater control over text-to-image diffusion models using reference images. This novel ControlNet architecture is tailored for transformer-based models, providing explicit controllability while maintaining the quality of image generation.

 

In conclusion, PixArt-δ is a force to be reckoned with in the realm of open-source image generators. Its remarkable speed, accuracy, resolution, and controllability place it at the forefront of innovation in this space, outperforming its counterparts and significantly expanding its reach with improved accessibility and usability.

 

DAILY LINKS TO YOUR INBOX

PROMPT ENGINEERING

Prompt Engineering Guides

ShareGPT

 

©2024 The Horizon