Research
Research Interests
I am broadly interested in theory, algorithm and application of machine learning. I am also interested in non-convex and convex optimization.
Recently, I am dedicated to to use theory to design algorithms elegantly.
Specifically, my recent research topics are
Deep Learning Theory: theory and theory-inspired algorithms.
Expressivity: Exploring the expressive power of Transformers through the lens of approximation theory [8][12]; the expressivity of state-space models.
Optimization: When training neural networks, why can optimization algorithms converge to global minima? [2][4][11][12]
Implicit Bias: When training neural networks, why can optimization algorithms converge to global minima with favorable generalization ability (even without any explicit regularization)? Such as flat-minima-bias [3][5][9][10][11] and max-margin-bias aspects [4][6].
Generalization: How to measure the generalization ability of neural networks. [1]
Algorithm Design: For machine learning problems, design new optimization algorithms which can which can (i) converge faster [10]; (ii) generalize better [6][10]
Transformer and Large Language Model: theory and algorithm. [8][10][12]
Expressive Power: The expressive power and mechanisms of Transformer [8][12]; the mechanisms of in-context learning[12]; the expressivity of state-space models.
Algorithm Design: Design faster optimizers for training LLMs [10]; design more efficient model architectures; design more efficient strategy for data selection
Non-convex and Convex Optimization: theory and algorithm. [2][4][6][10][11][12]
Convex Optimization in ML. [6]
Non-convex Optimization in ML. [2][4][10][11][12]
Algorithm Design: Design faster optimizers for training neural networks [10]; accelerate the convergence for the problems with specific structure [6].
CV and NLP: algorithm and application. [7]
Recent Publications
[10] Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang, Jinbo Wang, Haotian He, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu
2024 Conference on Neural Information Processing Systems (NeurIPS 2024), 1-35.
[9] Loss Symmetry and Noise Equilibrium of Stochastic Gradient Descent
Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu
2024 Conference on Neural Information Processing Systems (NeurIPS 2024), 1-26.
[8] Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling
Mingze Wang, Weinan E
2024 Conference on Neural Information Processing Systems (NeurIPS 2024), 1-70.
[7] Are AI-Generated Text Detectors Robust to Adversarial Perturbations?
Guanhua Huang, Yuchen Zhang, Zhe Li, Yongjian You, Mingze Wang, Zhouwang Yang
2024 Annual Meeting of the Association for Computational Linguistics (ACL 2024), 1-20.
[6] Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Mingze Wang, Zeping Min, Lei Wu
2024 International Conference on Machine Learning (ICML 2024), 1-38.
[5] A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
Mingze Wang, Lei Wu
NeurIPS 2023 Workshop on Mathematics of Modern Machine Learning (NeurIPS 2023 - M3L), 1-30.
[4] Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks
Mingze Wang, Chao Ma
2023 Conference on Neural Information Processing Systems (NeurIPS 2023) (Spotlight, top 3.5%), 1-94.
[3] The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu, Mingze Wang, Weijie J. Su
2022 Conference on Neural Information Processing Systems (NeurIPS 2022), 1-25.
[2] Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks
Mingze Wang, Chao Ma
2022 Conference on Neural Information Processing Systems (NeurIPS 2022), 1-73.
Recent Preprints
* indicates equal contribution.
[12] How Transformers Implement Induction Heads: Approximation and Optimization Analysis
Mingze Wang*, Ruoxi Yu*, Weinan E, Lei Wu.
arXiv preprint, 1-39. Oct 2024.
[11] Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou*, Mingze Wang*, Yuchen Mao, Bingrui Li, Junchi Yan.
arXiv preprint, 1-24. Oct 2024.
[1] Generalization Error Bounds for Deep Neural Networks Trained by SGD
Mingze Wang, Chao Ma
arXiv preprint, 1-32, June 2022.
Co-authors
|