Yu Bai

Curriculum Vitae | Google Scholar Profile | Email | Github

About Me

I am a currently a Researcher at OpenAI.

Previously, I was a Senior Research Scientist at Salesforce AI Research in Palo Alto, CA. My research interest lied broadly in machine learning, such as deep learning, large language models/foundation models, reinforcement learning, learning in games, and uncertainty quantification. Before joining Salesforce, I completed my PhD in Statistics at Stanford University (specializing in machine learning) in September 2019, where I was fortunate to be advised by Prof. John Duchi and was a member of the Machine Learning Group. Prior to Stanford, I was an undergrad in mathematics at Peking University.

My research has focused on Large language models; Theoretical foundations of deep learning (blog post); Reinforcement learning theory (slides on partially observable RL); Multi-agent reinforcement learning and games (blog post, slides on MARL, slides on Extensive-Form Games); and Uncertainty quantification (slides), among others.

Research Focus and Selected Publications

Selected Recent Work

GPT-5.5.
OpenAI, 2026.

GPT-5.4 & GPT-5.3-Codex.
OpenAI, 2026.

GPT-5.2.
OpenAI, 2025.

GPT-5.
OpenAI, 2025. [system card]

OpenAI o1.
OpenAI, 2024. [system card]

Foundation Models and Transformers

Our goal is to discover new capabilities and new understandings of transformers and large language models.

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs.
Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei.

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations.
Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai.
ICLR 2024.

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining.
Licong Lin, Yu Bai, Song Mei.
ICLR 2024.

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection.
Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei.
NeurIPS 2023 (Oral). [Code]

Multi-Agent Reinforcement Learning Theory

We developed the first line of provably efficient algorithms for multi-agent reinforcement learning.

Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation.
Yuanhao Wang, Qinghua Liu, Yu Bai, Chi Jin.
COLT 2023.

Policy Optimization for Markov Games: Unified Framework and Faster Convergence.
Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, Yu Bai.
NeurIPS 2022.

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?
Ziang Song, Song Mei, Yu Bai.
ICLR 2022.

Near-Optimal Reinforcement Learning with Self-Play.
Yu Bai, Chi Jin, Tiancheng Yu.
NeurIPS 2020.

Provable Self-Play Algorithms for Competitive Reinforcement Learning.
Yu Bai, Chi Jin.
ICML 2020.

Deep Learning Theory

We developed optimization and generalization results for overparametrized neural networks beyond the Neural Tangent Kenrels (NTK) regime, and identified provable advantages over the NTK regime.

Towards Understanding Hierarchical Learning: Benefits of Neural Representations.
Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher.
NeurIPS 2020.

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks.
Yu Bai, Jason D. Lee.
ICLR 2020.

Partially Observable Reinforcement Learning

We designed sharp sample-efficient algorithms and studied the fundamental limits for partially observable reinforcement learning.

Lower Bounds for Learning in Revealing POMDPs.
Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai.
ICML 2023.

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms.
Fan Chen, Yu Bai, Song Mei.
ICLR 2023 (Notable-top-25% / “Spotlight”).

Learning in Games

We designed near-optimal algorithms for learning equilibria in various multi-player games under bandit feedback.

Learning Rationalizable Equilibria in Multiplayer Games.
Yuanhao Wang, Dingwen Kong, Yu Bai, Chi Jin.
ICLR 2023.

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent.
Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu.
NeurIPS 2022 (Oral).

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games.
Ziang Song, Song Mei, Yu Bai.
NeurIPS 2022.

Near-Optimal Learning of Extensive-Form Games with Imperfect Information.
Yu Bai, Chi Jin, Song Mei, Tiancheng Yu.
ICML 2022.

Uncertainty Quantification in Machine Learning

We gave precise theoretical characterizations of the calibration and coverage of vanilla machine learning algorithms, and developed new uncertainty quantificaiton algorithms with valid guarantees and improved efficiency.

Efficient and Differentiable Conformal Prediction with General Function Classes.
Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, Caiming Xiong.
ICLR 2022. [Code]

Understanding the Under-Coverage Bias in Uncertainty Estimation.
Yu Bai, Song Mei, Huan Wang, Caiming Xiong.
NeurIPS 2021 (Spotlight).

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification.
Yu Bai, Song Mei, Huan Wang, Caiming Xiong.
ICML 2021.