-
Notifications
You must be signed in to change notification settings - Fork 52
Description
The following peer review was solicited as part of the Distill review process. Some points in this review were clarified by an editor after consulting the reviewer.
The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to the reviewer, for taking the time to review this article.
Conflicts of Interest: Reviewer disclosed no conflicts of interest.
-
Typo: In the section "Optimal Parameters" the labels "Convergence rate, Gradient Descent" and "Convergence rate, Momentum" are switched.
-
Note that the optimal convergence rate doesn't hold for outside quadratics. Often one needs to use Nesterov. A simple counter-example can be found in section 4.6 of https://arxiv.org/pdf/1408.3595.pdf
-
In the first diagram of "The Dynamics of Momentum," we can see that there are regimes where only momentum matters for convergence rate. This is surprising and may be worth an additional note. Is it worth writing out equation for eigenvalues explicitly, maybe in a footnote, and showing how learning rate cancels out in the complex case?
-
Often, people think about Momentum in terms of Chebyshev Polynomials. This doesn't seem necessary to cover, but it might be good to reference somewhere.
-
In the section "The Limits of Descent" the author writes "Like many such lower bounds, this result must not be taken literally, but spiritually." This is a good point to say, but the author may wish to consider whether it may strike a bad chord with people talking about taking Donald Trump "literally but not seriously."
-
"Many of our favorite methods, including BFGS, and more, do not fall into the class of linear first order methods." Note that BFGS and Conjugate Gradient are the same thing when restricted to quadratics. (LBFGS is not the same thing, however.)