Research

Research Interests

I am broadly interested in theory, algorithm and application of machine learning. I am also interested in non-convex and convex optimization. Recently, I am also dedicated to to use theory to design algorithms elegantly. Specifically, my recent research topics are

  • Deep Learning Theory: optimization, generalization, implicit bias, and approximation.

    • Optimization: When training neural networks, why can optimization algorithms converge to global minima? [1][4]

    • Implicit Bias: When training neural networks, why can optimization algorithms converge to global minima with favorable generalization ability (even without any explicit regularization)? Such as flat-minima-bias [2][5] and max-margin-bias aspects [4][6].

    • Algorithm Design: For machine learning problems, design new optimization algorithms which can converge to global minima with better generalization ability. [6]

    • Approximation: Exploring the expressive power of deep neural networks through the lens of approximation theory. [7]

    • Generalization: How to measure the generalization ability of neural networks. [3]

  • Transformer and Large Language Model: theory and algorithm.

    • Expressive Power: The expressive power and mechanisms of Transformer [7]. (On the preparation)

    • Algorithm Design: (On the preparation).

  • Non-convex and Convex Optimization: theory and algorithm.

    • Convex Optimization in ML. [6]

    • Non-convex Optimization in ML. [1][4]

  • CV and NLP: algorithm and application. [9]

Recent Publications

Recent Preprints

Co-authors

  • Weinan E. Peking University; Princeton University; AI for Science Institute.

  • Chao Ma. Department of Mathematics, Stanford University.

  • Lei Wu. School of Mathematical Sciences, Peking University.

  • Weijie J. Su. Department of Statistics and Data Science, University of Pennsylvania.