Convergence of gradient descent algorithms


In this post, I review the convergence proofs of gradient algorithms. Our main reference is: Leon Bottou, Online learning and stochastic approximations. I rewrite the proofs described in Bottou's paper but with more details about the points which are subtle to me. I tried to write the proofs as clear as possible so as to make them accessible to everyone.