As we aim to find the minimum of $f(x, y)$, we can take it directly as the objective function, which writes
$$f(x,y)=(x-y)^4+2x^2+y^2-x+2y$$and implement it as
obj_fun = lambda x: (x[0]-x[1])**4 + 2*x[0]**2 + x[1]**2 - x[0] + 2*x[1]
To employ the gradient descent in the multivariate case, we may analytically construct its partial derivatives given by
$$\frac{\partial f}{\partial x}(x,y)=4(x-y)^3 + 4x-1$$$$\frac{\partial f}{\partial y}(x,y)=-4(x-y)^3 + 2y+2$$This is re-written in vector form as
$$\nabla \mathbf{F}(\mathbf{x}) = \begin{bmatrix} \frac{\partial f}{\partial x}(x,y)\\ \frac{\partial f}{\partial y}(x,y)\end{bmatrix} = \begin{bmatrix} 4(x-y)^3 + 4x-1 \\ -4(x-y)^3 + 2y+2 \end{bmatrix} $$and implemented as
gradient = lambda x: [4*(x[0]-x[1])**3 + 4*x[0]-1, -4*(x[0]-x[1])**3 + 2*x[1]+2]
The gradient descent is applicable for multivariate functions with dimensions $m$ such that $\textbf{x}\in\mathbb{R}^m$ in
$$\mathbf{x}_{k+1}=\mathbf{x}_k-\gamma \nabla \mathbf{F}(\mathbf{x}_k), \qquad k \in \mathbb{N}$$where $k$ is the iterative index, $\nabla \mathbf{F}(\cdot)$ the gradient of candidate $\mathbf{x}_k$ and $\gamma$ the learning rate. An implementation of the gradient descent is given below.
def multivar_descent(x, gradient, l=0.02, tol=1e-10, eps=1):
x_list = []
while tol<eps<100:
x_list.append(x.copy())
g = gradient(x)
x -= np.multiply(l, g)
eps = np.sum(np.abs(g)) # error
return x_list, eps
For minimization, the Newton-Raphson approach can be generalized to $m$ variables with the first-order derivative $\nabla\mathbf{F}(\mathbf{x})\in\mathbb{R}^m$ from above and the second-order derivative $\mathbf{H}(\mathbf{x}) \in \mathbb{R}^{m\times m}$ as the Hessian matrix, which in our case is given by
$$ \mathbf H(\mathbf{x}) = \begin{bmatrix} \dfrac{\partial^2 f(x,y)}{\partial x^2} & \dfrac{\partial^2 f(x,y)}{\partial x \partial y}\\[1em] \dfrac{\partial^2 f(x,y)}{\partial y \partial x} & \dfrac{\partial^2 f(x,y)}{\partial y^2} \end{bmatrix} = \begin{bmatrix} 12(x-y)^2 + 4 & -12(x-y)^2 \\ -12(x-y)^2 & 12(x-y)^2 + 2 \end{bmatrix} $$and implemented as
hessian = lambda x: [[12*(x[0]-x[1])**2 + 4, -12*(x[0]-x[1])**2], [-12*(x[0]-x[1])**2, 12*(x[0]-x[1])**2 + 2]]
One thus obtains the iterative scheme
$$\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma \big[\mathbf H(\mathbf{x}_k)\big]^{-1} \nabla\mathbf{F}(\mathbf{x}_k), \qquad k \in \mathbb{N}$$using the reciprocal of the second derivative with the invertible matrix of the Hessian matrix. Often this approach includes a small learning rate $ 0 < \gamma \le 1 $ instead of $\gamma=1$ to ensure that the Wolfe conditions are satisfied at each iteration. For step sizes other than 1, the method is referred to as the relaxed or damped Newton's method whose implementation is provided hereafter.
import numpy as np
def newton_hessian(x, gradient, hessian, l=1, tol=1e-10, eps=1):
x_list = []
while tol<eps<100:
x_list.append(x.copy())
g = gradient(x)
H = hessian(x)
x -= l*np.dot(np.linalg.inv(H), g)
eps = np.sum(np.abs(g)) # error
return x_list, eps
start_vals = [1., -1.]
tol = 1e-11
l=0.02
pts_dsc, eps = multivar_descent(start_vals, gradient, l, tol=tol)
print('Gradient descent solution found at %s in %s iterations with error %s.' % (pts_dsc[-1], len(pts_dsc), eps))
l=0.1
pts_hes, eps = newton_hessian(start_vals, gradient, hessian, l, tol)
print('Newton method solution found at %s in %s iterations with error %s.' % (pts_hes[-1], len(pts_hes), eps))
%matplotlib notebook
import matplotlib.pyplot as plt
plt.semilogx(range(len(pts_dsc)), np.sum(pts_dsc, -1), label='Gradient Descent')
plt.semilogx(range(len(pts_hes)), np.sum(pts_hes, -1), label='Newton Method')
#plt.xlim([1, max(len(pts_dsc), len(pts_hes))])
plt.xlabel('Iterations $k$')
plt.ylabel('Result $\mathbf{x}$')
plt.legend()
Gradient descent solution found at [ 0.03349047 -0.56698094] in 433 iterations with error 9.920952948050399e-12. Newton method solution found at [ 0.03349047 -0.56698094] in 283 iterations with error 9.394707234378075e-12.
<matplotlib.legend.Legend at 0x7f8d00ef0290>
# compute grid
s = 1
x = np.linspace(-1*s, 1*s, int(20*s)+1)
y = np.linspace(-1*s, 1*s, int(20*s)+1)
xx, yy = np.meshgrid(x, y)
zz = obj_fun([xx, yy])
# import plot libs
%matplotlib inline
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import matplotlib.pyplot as plt
# plot functions
fig, ax = (plt.figure(figsize=(8, 3)), plt.axes(projection='3d'))
ax.scatter(xx, yy, zz, c=zz/zz.max(), cmap=cm.jet, linewidth=.5, s=5, label='$f(x, y)$')
#ax.plot_wireframe(xx, yy, zz, cmap=cm.jet, linewidth=.5, color='orange', label='$f(x, y)$')
#ax.plot_trisurf(xx.flatten(), yy.flatten(), zz.flatten(), cmap=cm.jet, linewidth=.2, label='$f(x, y)$')
# initial guess and solution
pts = pts_hes #pts_dsc
ax.scatter(pts[0][0], pts[0][1], obj_fun(pts[0]), s=40, marker='x', color='k', label='Initial guess')
ax.scatter(pts[-1][0], pts[-1][1], obj_fun(pts[-1]), s=40, marker='x', color='red', label='Solution')
# plot gradient steps
x_p = pts[0]
z_p = obj_fun(x_p)
for x_k in pts[1:min(25, len(pts))]:
z_k = obj_fun(x_k)
ax.quiver(x_p[0], x_p[1], z_p, x_k[0]-x_p[0], x_k[1]-x_p[1], z_k-z_p, linewidth=1, color='green')
x_p, z_p = x_k, z_k
ax.legend()
<matplotlib.legend.Legend at 0x7f8d00670d10>