期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2016
卷号:113
期号:47
页码:E7351-E7358
DOI:10.1073/pnas.1614734113
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:SignificanceOptimization problems arise naturally in statistical machine learning and other fields concerned with data analysis. The rapid growth in the scale and complexity of modern datasets has led to a focus on gradient-based methods and also on the class of accelerated methods, first proposed by Nesterov in 1983. Accelerated methods achieve faster convergence rates than gradient methods and indeed, under certain conditions, they achieve optimal rates. However, accelerated methods are not descent methods and remain a conceptual mystery. We propose a variational, continuous-time framework for understanding accelerated methods. We provide a systematic methodology for converting accelerated higher-order methods from continuous time to discrete time. Our work illuminates a class of dynamics that may be useful for designing better algorithms for optimization. Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. Although many generalizations and extensions of Nesterovs original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this paper, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangian functional that we call the Bregman Lagrangian, which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods. We show that the continuous-time limit of all of these methods corresponds to traveling the same curve in spacetime at different speeds. From this perspective, Nesterovs technique and many of its generalizations can be viewed as a systematic way to go from the continuous-time curves generated by the Bregman Lagrangian to a family of discrete-time accelerated algorithms.