Authors: Michal Rolinek, Georg Martius
Is Adam really the best we can do? An simple enough update rule can dramatically outperform Adam on some datasets. The optimizer turned out not to be very robust but it had its moments such as actually driving the training loss on MNIST to 0.0 in 20 epochs.
Tired of tuning parameters of SGD or Adam for #DeepLearning? Our new optimizer (https://t.co/90hi80ghna) works much better than the best constant learning rates. Try it out: #Tensorflow code included, see https://t.co/k4YVzeqJrF pic.twitter.com/qmBxsYWYgA— Georg Martius (@GMartius) February 19, 2018