
GitHub
ML engineer working on model evaluation and fine-tuning at Anthropic. Spelman + Georgia Tech. Passionate about responsible AI and making ML systems reliable.

One-click fine-tuning pipeline cutting model training from 2 weeks to 4 hours
Internal teams needed custom fine-tuned models but process was manual and error-prone. Each fine-tune took an ML engineer 2 weeks of babysitting.
Built end-to-end fine-tuning pipeline with Modal for compute, W&B for experiment tracking, Claude for data quality checks. One-click training with automatic hyperparameter tuning.
Fine-tuning time: 2 weeks → 4 hours. Internal teams can self-serve model training. Shipped 15 custom models in Q1 vs 3 previous quarter. Compute costs down 40%.

Model inference was slow and expensive. P50 latency was 800ms. GPU utilization was only 40%. Serving costs were $50k/month for one model.
Profiled inference pipeline, applied quantization and batching optimizations. Used Modal for autoscaling. Built caching layer for common queries.
P50 latency: 800ms → 240ms. GPU utilization: 40% → 85%. Serving costs: $50k → $18k/month. Can now serve 3x traffic with same infrastructure.

Model evaluation was ad-hoc and incomplete. Critical capability regressions were caught in production. Eval suite covered only 30% of important behaviors.
Designed comprehensive evaluation framework. Used Claude to generate diverse test cases, built automated regression detection, created dashboards for eval tracking.
Eval coverage: 30% → 90% of critical behaviors. False positive rate cut 50%. Caught 12 regressions pre-release that would have shipped. Framework adopted org-wide.