L4: Practical loss-based stepsize adaptation for deep learning

NIPS 2018

Authors: Michal Rolinek, Georg Martius

Is Adam really the best we can do? An simple enough update rule can dramatically outperform Adam on some datasets. The optimizer turned out not to be very robust but it had its moments such as actually driving the training loss on MNIST to 0.0 in 20 epochs.

Links: Arxiv Github