Implementing Gradient Descent According to Cauchy

In the previous post, we examined some of the thoughts which Cauchy had on gradient descent. Here, let’s implement them for real!

To recap Cauchy postulated there were three approaches which would guarentee that gradient descent would be able to find (at least the local) minima.

Backtracking Line search - This is the most obvious method, whereby, we start with a “large” value of $\theta$, and keep decreasing it until we improve over the previous estimate.
Steepest Descent - Determine $\theta$ based on trying to determine analytically by solving $$\frac{\partial f(x-\theta f^\prime(x, ., .), y-\theta f^\prime(., y, .), z-\theta f^\prime(., ., z))}{\partial \theta} = 0$$ Which can be done via any kind of root-finding algorithm.
The last approach, presumes that the current estimation is already “close” to ideal and close to $0$, and uses the approximation directly to find the ideal value of $\theta$, that is $$\theta = \frac{f(x, y, z)}{(f^\prime(x, ., .))^2 + (f^\prime(., y, .))^2 + (f^\prime(., ., z))^2}$$