Security, Privacy, and Provenance for Generative AI
Jaiden Fairoze
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2026-126
May 14, 2026
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2026/EECS-2026-126.pdf
As generative AI transitions from novelty to critical infrastructure, ensuring its provenance, security, and privacy has become paramount. This dissertation studies all three concerns through the lens of cryptography, using its tools to construct provable safety guarantees and expose theoretically-grounded limitations.
Provenance asks whether AI-generated content can be reliably attributed to its source. We construct the first publicly-detectable and unforgeable watermarking scheme for language models. For images, where robustness is a greater concern, we show that it is possible to construct a watermarking scheme that is simultaneously publicly-detectable and robust; however, robust embedding models are infeasible to instantiate with current machine learning capabilities.
Security asks whether lightweight guardrails can reliably defend language models. We show that adversaries can encode malicious intent into computationally hard-to-detect structures, exploiting the resource asymmetry between a guardrail and the model it protects. Our attack, controlled-release prompting, succeeds at near-perfect rates against the production chat interfaces of major AI platforms that resist baseline jailbreaks, and a systematic evaluation of open-weight prompt guards further supports the asymmetry hypothesis.
Privacy asks what happens when a model is tasked with keeping a secret in its context. We study inadvertent context leakage through a predicate-inference game where an adversary tries to recover a secret from model outputs. We find that proprietary models are broadly susceptible to bit-level secret leakage. Leakage grows with model capability, indicating it is intrinsic to stronger instruction-following rather than an incidental flaw.
Advisors: Sanjam Garg