Generalization bounds for randomized learning with application to stochastic gradient descent
Randomized algorithms are central to modern machine learning. In the presence of massive datasets, researchers often turn to stochastic optimization to solve learning problems. Of particular interest is stochastic gradient descent (SGD), a first-order method that approximates the learning objective and gradient by a random point estimate. A classical question in learning theory is, if a randomized learner has access to a finite training sample, will the resulting learned model generalize to the data’s generating distribution?