Research

Research Interests

I am broadly interested in theory, algorithm and application of machine learning. I am also interested in non-convex and convex optimization.

Recently, I am dedicated to to use theory to design algorithms elegantly.

Specifically, my recent research topics are

  • Deep Learning Theory: theory and theory-inspired algorithms.[1][2][3][4][5][6][8][9][10][11][12][13][14][17][18]

    • Expressivity: Exploring the expressive power of Transformers through the lens of approximation theory [8][12]; the expressivity of the expressivity of mixture-of-experts models (MoE) [16].

    • Optimization: When training neural networks, why can optimization algorithms converge to global minima? [2][4][12]

    • Implicit Bias: When training neural networks, why can optimization algorithms converge to global minima with favorable generalization ability (even without any explicit regularization)? Such as flat-minima-bias [3][5][9][10][11] and max-margin-bias aspects [4][6].

    • Generalization: How to measure the generalization ability of neural networks. [1]

    • Algorithm Design: For machine learning problems, design new provable optimization algorithms which can which can (i) converge faster [10][13][17][18]; (ii) generalize better [6][10].

  • Transformer and Large Language Model: theory and algorithm, especially in LLM pre-training. [8][10][12][13][17][18]

    • Expressive Power: The expressive power and mechanisms of Transformer [8][12]; the expressivity of mixture-of-experts models (MoE)[16]; the mechanisms of in-context learning[12].

    • Algorithm Design: Design provable faster optimizers for training LLMs [10][13][17][18]; design more efficient model architectures.

  • Non-convex and Convex Optimization: theory and algorithm. [2][4][6][10][11][12][13][14][17][18]

    • Convex Optimization in ML. [6]

    • Non-convex Optimization in ML. [2][4][10][11][12][13][14][17][18]

    • Algorithm Design: Design provable faster / more stable optimizers for training neural networks [10][13][17][18]; accelerate the convergence for the problems with specific structure [6].

Recent Publications and Preprints

* indicates equal contribution, † means project lead.

Co-authors

  • Weinan E. Peking University; Princeton University; AI for Science Institute.

  • Chao Ma. Department of Mathematics, Stanford University.

  • Lei Wu. School of Mathematical Sciences, Peking University.

  • Weijie J. Su. Department of Statistics and Data Science, University of Pennsylvania.