Implicit Biases of Large Learning Rates in Machine Learning

When and Where

Tuesday, July 09, 2024 2:00 pm to 3:00 pm
9199
Ontario Power Building
700 University Avenue, Toronto, ON M5G 1Z5

Speakers

Molei Tao, Georgia Tech

Description

This talk will discuss some nontrivial but often pleasant effects of large learning rates, through the lens of nonlinear training dynamics. Large learning rates are commonly used in machine learning practice for improved empirical performances, but defy traditional theoretical analyses. I will first quantify how large learning rates help gradient descent escape local minima in multiscale landscape. This is via chaotic dynamics, which provides an alternative to the commonly known escape mechanism due to noises from stochastic gradients. I will then report how large learning rates provably bias toward flatter minimizers. Several related, perplexing phenomena have been empirically observed recently, including Edge of Stability, loss catapulting, and balancing. I will unify them and explain that they are all algorithmic implicit biases of large learning rates. These results are enabled by a new global convergence result of gradient descent, for certain nonconvex functions without Lipschitz gradient. This theory will also provide understanding of when there will be Edge of Stability and other large learning rate implicit biases.

Map

700 University Avenue, Toronto, ON M5G 1Z5

Categories