Implicit Biases of Large Learning Rates in Machine Learning

Add to Calendar

When and Where

Tuesday, July 09, 2024 2:00 pm to 3:00 pm

9199

Ontario Power Building

700 University Avenue, Toronto, ON M5G 1Z5

Speakers

Molei Tao, Georgia Tech

Description

This talk will discuss some nontrivial but often pleasant effects of large learning rates, through the lens of nonlinear training dynamics. Large learning rates are commonly used in machine learning practice for improved empirical performances, but defy traditional theoretical analyses. I will first quantify how large learning rates help gradient descent escape local minima in multiscale landscape. This is via chaotic dynamics, which provides an alternative to the commonly known escape mechanism due to noises from stochastic gradients. I will then report how large learning rates provably bias toward flatter minimizers. Several related, perplexing phenomena have been empirically observed recently, including Edge of Stability, loss catapulting, and balancing. I will unify them and explain that they are all algorithmic implicit biases of large learning rates. These results are enabled by a new global convergence result of gradient descent, for certain nonconvex functions without Lipschitz gradient. This theory will also provide understanding of when there will be Edge of Stability and other large learning rate implicit biases.

Map

700 University Avenue, Toronto, ON M5G 1Z5

Universal Navigation

Universal Navigation2

Main menu

Implicit Biases of Large Learning Rates in Machine Learning

When and Where

Speakers

Description

Map

Categories

Audiences

Footer Main-Menu

Footer Secondary Menu

Contact Us

Footer Accessibility Menu

Universal Navigation

Universal Navigation2

Main menu

Search form

Implicit Biases of Large Learning Rates in Machine Learning

When and Where

Speakers

Description

Map

Categories

Audiences