Skip to content. Skip to navigation
CIM Menus
 

Informal Systems Seminar (ISS), Centre for Intelligent Machines (CIM) and Groupe d'Etudes et de Recherche en Analyse des Decisions (GERAD)

A PDE approach to regularization in Deep Learning


Adam M. Oberman
Department of Mathematics and Statistics McGill University

September 29, 2017 at  11:00 AM
McConnell Engineering Room 437

The fundamental tool for training deep neural networks is Stochastic Gradient Descent (SGD). In this talk we discuss an algorithmically simple modification of (SGD) which significantly improves the training time as well as the generalization error for benchmark DNNs. We also discuss a related algorithm also allows for effective training of DNNs in parallel. Mathematically, we make a connection to Stochastic Optimal Control and the related nonlinear PDEs for the value function, Hamilton-Jacobi-Bellman equations. The PDE interpretation allows us to prove that the algorithm improves the training time. Further connections with PDEs and nonconvex optimization allows us to determine optimal value of of the hyper-parameters of the algorithm which lead to further improvements.