October 21, 2009 : Yoshua Bengio

University of Montreal

On Training Deep Neural Networks

Whereas theoretical work suggests that deep architectures might be more efficient at representing highly-varying functions, training deep architectures was unsuccessful until the recent advent of algorithms based on unsupervised pre-training. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. We attempt to shed some light on these questions in several ways, by comparing different successful approaches to training deep architectures and through extensive simulations investigating explanatory hypotheses. The experiments confirm and clarify the advantage (and sometimes disadvantage) of unsupervised pre-training. They demonstrate the robustness of the training procedure with respect to the random initialization, the positive effect of pre-training in terms of optimization and its role as a regularizer (in both cases in unusual ways). We explore explanatory hypotheses based on the notion that early growth of the model parameters is determinant, and in particular that early use of unsupervised learning places the dynamics of supervised learning in attractors associated with local minima with good generalization properties. We discuss how several training approaches for deep architecture may exploit the principle of continuation methods in order to find good local minima. In particular we suggest that this is the case of shaping or the use of a curriculum, showing that it has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained. Finally, we investigate the nature and evolution of gradients at different levels of a deep supervised neural networks in an attempt to understand why training is sometimes slowed down, and sometimes possibly stuck in apparent local minima.


Yoshua Bengio (PhD'1991, McGill University) is professor at the Department of Computer Science and Operations Research, Universite de Montreal, and Canada Research Chair in Statistical Learning Algorithms, as well as NSERC-CGI Chair, and Fellow of the Canadian Institute for Advanced Research. He was program co-chair for NIPS'2008 and is general co- chair for NIPS'2009. His main ambition is to understand how learning can give rise to intelligence. He has been an early proponent of deep architectures and distributed representations as tools to bypass the curse of dimensionality and learn complex tasks. He contributed to many machine learning areas: neural networks, recurrent neural networks, graphical models, kernel machines, semi-supervised learning, unsupervised learning and manifold learning, pattern recognition, data-mining, natural language processing, machine vision, and time-series models.

seminars/seminaritems/2009-10-21_october_21_2009_yoshua_bengio.txt · Last modified: 2009/11/09 09:59 by koray