-
Stable Zero123 generates novel views of an object, demonstrating 3D understanding of the object’s appearance from various angles with notably improved quality over Zero1-to-3 or Zero123-XL due to improved training datasets and elevation conditioning.
-
Based on Stable Diffusion 1.5, this model consumes the same amount of VRAM as SD1.5 to generate 1 novel view. Using Stable Zero123 to generate 3D objects requires more time and memory (24GB VRAM recommended).
-
Stable Zero123 produces notably improved results compared to the previous state-of-the-art, Zero123-XL. This is achieved through 3 key innovations:
-
An improved training dataset heavily filtered from Objaverse, to only preserve high quality 3D objects, that we rendered much more realistically than previous methods
-
During training and inference, we provide the model with an estimated camera angle. This elevation conditioning allows it to make more informed, higher quality predictions.
-
A pre-computed dataset (pre-computed latents) and improved dataloader supporting higher batch size, that, combined with the 1st innovation, yielded a 40X speed-up in training efficiency compared to Zero123-XL.
-