Research

Research Interests

I am broadly interested in theory, algorithm and application of machine learning. I am also interested in non-convex and convex optimization.

Recently, I am dedicated to to use theory to design algorithms elegantly.

Specifically, my recent research topics are

  • Deep Learning Theory: theory and theory-inspired algorithms.

    • Expressivity: Exploring the expressive power of Transformers through the lens of approximation theory [8][12]; the expressivity of state-space models.

    • Optimization: When training neural networks, why can optimization algorithms converge to global minima? [2][4][11][12]

    • Implicit Bias: When training neural networks, why can optimization algorithms converge to global minima with favorable generalization ability (even without any explicit regularization)? Such as flat-minima-bias [3][5][9][10][11] and max-margin-bias aspects [4][6].

    • Generalization: How to measure the generalization ability of neural networks. [1]

    • Algorithm Design: For machine learning problems, design new optimization algorithms which can which can (i) converge faster [10]; (ii) generalize better [6][10]

  • Transformer and Large Language Model: theory and algorithm. [8][10][12]

    • Expressive Power: The expressive power and mechanisms of Transformer [8][12]; the mechanisms of in-context learning[12]; the expressivity of state-space models.

    • Algorithm Design: Design faster optimizers for training LLMs [10]; design more efficient model architectures; design more efficient strategy for data selection

  • Non-convex and Convex Optimization: theory and algorithm. [2][4][6][10][11][12]

    • Convex Optimization in ML. [6]

    • Non-convex Optimization in ML. [2][4][10][11][12]

    • Algorithm Design: Design faster optimizers for training neural networks [10]; accelerate the convergence for the problems with specific structure [6].

  • CV and NLP: algorithm and application. [7]

Recent Publications

Recent Preprints

* indicates equal contribution.

Co-authors

  • Weinan E. Peking University; Princeton University; AI for Science Institute.

  • Chao Ma. Department of Mathematics, Stanford University.

  • Lei Wu. School of Mathematical Sciences, Peking University.

  • Weijie J. Su. Department of Statistics and Data Science, University of Pennsylvania.