PhD theses from Stanford CS specifically cover GenAI
1. “Learning to generate data by estimating gradients of the data distribution”
- Author: Jiaming Song (Advised by Stefano Ermon)
- Core Focus: This thesis is foundational to modern generative AI, introducing score-based generative models (popularly known as diffusion models). It explores how to generate high-quality data samples (images, audio) by estimating score functions and utilizing Markov chain Monte Carlo, challenging the dominance of GANs and pioneering modern visual GenAI.
- Link / Source: Available via Stanford Digital Repository (PURL). [1, 2]
2. “From Pre-trained Language Models to Useful AI Systems”
- Author: Eric Mitchell (Advised by Chelsea Finn, Co-Advised by Chris Manning)
- Core Focus: This dissertation focuses on post-training alignment and updating methodologies for LLMs. It addresses how to turn a raw base language model into a helpful, safe, and factually accurate conversational assistant, utilizing methods like Model Editing and Direct Preference Optimization (DPO).
- Award: Recipient of the 2025 Arthur Samuel PhD Best Thesis Award. [1, 2]
3. “Efficient systems for serving neural retrievers and language models” [1]
- Author: Keshav Santhanam (Advised by Matei Zaharia and Christopher Potts)
- Core Focus: This work addresses the heavy computational expenses of generative Transformer-based models. It introduces algorithmic and system-level co-design architectures (like PLAID and ALTO) to dramatically speed up Retrieval-Augmented Generation (RAG) pipelines and LLM inference speeds.
- Link / Source: Available via Stanford Digital Repository (PURL). [1]
4. “On the evaluation of deep generative models”
- Author: Sharon Zhou (Advised by Andrew Ng)
- Core Focus: Evaluating generative models is notoriously complex due to the infinite variations of text and images they can create. This thesis introduces new evaluation metrics and human-in-the-loop validation frameworks built on psychophysics to reliably measure the perceptual realism and structural consistency of deep generative networks.
- Link / Source: Available via Stanford Digital Repository (PURL). [1, 2]
5. “Generating content for AI, and AI-generated content synthetic data and simulated environments for embodied AI” [1, 2]
- Author: Bokui Shen (Advised by Silvio Savarese)
- Core Focus: This dissertation bridges the gap between GenAI and Robotics. It evaluates how to use generative models to build highly accurate synthetic data and simulated training environments, allowing AI agents to learn real-world 3D object manipulation entirely in virtual simulations.
- Link / Source: Available via Stanford Digital Repository (PURL). [1]