
GitHub
Trained diffusion models generating 8M+ images/day at Stability AI. Ex-DeepMind.

Optimized SDXL inference to 3x throughput at one-third the cost per image
Stable Diffusion XL inference cost $0.018 per image at production quality, making the API economically challenging at scale. GPU utilization averaged 52% due to inefficient batching, and users experienced 8-12 second generation times.
Implemented classifier-free guidance distillation to reduce the required inference steps from 50 to 18 without perceptual quality loss. Built a dynamic batching system that groups requests by resolution and step count. Optimized the VAE decoder with TensorRT and added flash attention throughout the UNet.
Cost per image dropped from $0.018 to $0.006. Throughput increased 3.1x per GPU. Generation time fell to 2.8 seconds for standard resolution. GPU utilization hit 87%. Savings enabled Stability to offer a competitive free tier.

FID scores poorly correlated with human aesthetic preferences for generated images. Model improvements on FID sometimes degraded actual visual quality.
Developed a multi-dimensional perceptual quality score combining CLIP-based aesthetics, structural coherence, and artifact detection. Trained on 500K human preference pairs collected from A/B testing.
New metric correlated 0.89 with human preferences vs. 0.61 for FID. Adopted as the primary quality metric for 3 model training runs, reducing manual review cycles by 70%.