That problem is smooth.
And if it’s not, it is differentiable everywhere.
And if it’s not, we avoid the kinks almost surely.
And if we don’t, what is computed is a subgradient.
And if it’s not, it approximates one.
And if that’s not true, who cares? The loss went down.