Deep Learning Optimizer's Prayer

27 Apr 2025

That problem is smooth.

And if it’s not, it is differentiable everywhere.

And if it’s not, we avoid the kinks almost surely.

And if we don’t, what is computed is a subgradient.

And if it’s not, it approximates one.

And if that’s not true, who cares? The loss went down.