Lucas Moretti - Research Engineer at Stability AI

About

Trained diffusion models generating 8M+ images/day at Stability AI. Ex-DeepMind.

Proof of Work

2 plays

SDXL Inference Optimization: 3x Throughput, Same Quality

Optimized SDXL inference to 3x throughput at one-third the cost per image

Before$0.018/image, 52% GPU util, 8-12s generation

After$0.006/image, 87% GPU util, 2.8s generation, 3.1x throughput

EngineeringSystem

Situation

Stable Diffusion XL inference cost $0.018 per image at production quality, making the API economically challenging at scale. GPU utilization averaged 52% due to inefficient batching, and users experienced 8-12 second generation times.

Action

Implemented classifier-free guidance distillation to reduce the required inference steps from 50 to 18 without perceptual quality loss. Built a dynamic batching system that groups requests by resolution and step count. Optimized the VAE decoder with TensorRT and added flash attention throughout the UNet.

Result

Cost per image dropped from $0.018 to $0.006. Throughput increased 3.1x per GPU. Generation time fell to 2.8 seconds for standard resolution. GPU utilization hit 87%. Savings enabled Stability to offer a competitive free tier.

389 copies · 82 forks

PyTorchCUDATensorRTWeights & Biases

View Artifact

Perceptual Quality Metrics for Generative Models

EngineeringWorkflow

Situation

FID scores poorly correlated with human aesthetic preferences for generated images. Model improvements on FID sometimes degraded actual visual quality.

Action

Developed a multi-dimensional perceptual quality score combining CLIP-based aesthetics, structural coherence, and artifact detection. Trained on 500K human preference pairs collected from A/B testing.

Result

New metric correlated 0.89 with human preferences vs. 0.61 for FID. Adopted as the primary quality metric for 3 model training runs, reducing manual review cycles by 70%.

0 copies · 0 forks

PyTorchJAXWeights & Biases

View Artifact