Our goal is to build reliable, aligned, and trustworthy AI through rigorous analysis and principled methods.
We are currently excited about:
  • Rethinking pre-training to enable better post-training for safety and adaptation
  • Mitigating reward hacking, hallucinations, and security vulnerabilities in coding agents
  • Enabling diverse and creative solutions for open-ended tasks with language models
We’re recruiting students and postdocs for the 2025–2026 cycle who are interested in understanding and advancing the science and reliability of foundation models.

Recent News