Abstract
This paper introduces an algorithm inspired from the work of Franceschi et al. (2017) for automatically tuning the learning rate while training neural networks. We formalize this problem as minimizing a given performance metric (e.g. validation error) at a future epoch using its “hyper-gradient” with respect to the learning rate at the current iteration. Such a hyper-gradient is difficult to estimate and we discuss how approximations and Hessian-vector products allow us to develop a Real-Time method for Hyper-Parameter Optimization (RT-HPO). We present a comparison between RT-HPO and other popular HPO techniques and show that our approach performs better in terms of the final accuracy of the trained model. Online adaptation of the learning introduces two extra hyper-parameters, the initial value of the learning rate and the hyper-learning rate; our empirical results demonstrate that the accuracy obtained by RT-HPO is largely insensitive to these hyper-parameters.
Research areas