Optimization generally involves the definition of a cost (or loss) function prior to the actual minimization to assess each intermediate result candidate $x_k, \, k \in \mathbb{N}$ by analyzing how much it deviates from the point of convergence. We employ the squared loss here as it is differentiable and given by
$$L(x_k)=\left( f^{-1}(x_k)-x)\right)^2=\left(x_k^b-x\right)^2$$where $x$ represents the requested input which is constant.
L = lambda x_k, x, b: (x_k**b - x)**2
Dual differentation of the loss function is an important requirement for the Newton method. The first-order derivative can be given by
$$\frac{\partial}{\partial x_k} L(x_k)=2b\left(x_k^{b+1}-x_kx\right)$$of our cost function is obtained from the chain rule and written as
d1L = lambda x_k, x, b: 2*b*(x_k**(b+1) - x*x_k)
Similarly, we obtain the second-order derivative given by
$$\frac{\partial^2}{\partial^2 x_k} L(x_k)=2b\left((b+1)x_k^b-x\right)$$which is implemented as
d2L = lambda x_k, x, b: 2*b*((b+1)*x_k**b - x)
Using Newton's method, we aim to converge to $f'(x)=0$ which may be achieved by second-order Taylor expansion yielding
$$x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}=x_k-\left(\frac{\partial}{\partial x_k} L(x_k)\right)\left(\frac{\partial^2}{\partial^2 x_k} L(x_k)\right)^{-1}$$which is implemented hereafter with a tolerance value and maximum iteration number as break conditions.
def root_newton(x, b, tol=10**-17, max_iter=2000):
x_k = x / 2 # starting guess
x_list = [x_k] # list collecting all candidates
while d1L(x_k, x, b) > tol and len(x_list) < max_iter:
x_k -= d1L(x_k, x, b)/d2L(x_k, x, b)
x_list.append(x_k)
return x_list
For sake of benchmarking, the binary search algorithm is used in the following.
def root_binary(x, b, tol=10**-16, max_iter=2000):
high = x / 2
low = 0
mean = (high + low) / 2
x_list = [mean]
while abs(mean**b-x) > tol and len(x_list) < max_iter:
mean = (high + low) / 2
if mean**b > x:
high = mean
else:
low = mean
x_list.append(mean)
return x_list
The requested input $x\in\mathbb{R}_+$ for root computation and respective exponent $b\in\mathbb{N}$ are set hereafter and can be varied to see their effects on the computational process. From the plot below, it can be observed that the Newton method generally converges to a more accurate result with fewer iterations whereas the binary search yields good first approximates oscillating around the convergence point $x$.
# input parameters
x = 144
b = 2
tol = 10**-9
print("Find the root of %s to the power of %s." % (x, b))
# run optimization
res_nwt = root_newton(x, b, tol)
res_bin = root_binary(x, b, tol)
# evaluation
truth = x**(1/b)
print("Newton: %s w/ %s diverg. after %d iterations" % (res_nwt[-1], res_nwt[-1] - truth, len(res_nwt)))
print("Binary: %s w/ %s diverg. after %d iterations" % (res_bin[-1], res_bin[-1] - truth, len(res_bin)))
import matplotlib.pyplot as plt
%matplotlib inline
plt.loglog(range(len(res_nwt)), res_nwt, label='Newton method')
plt.loglog(range(len(res_bin)), res_bin, label='Binary search')
plt.xlim([1, max(len(res_nwt), len(res_bin))])
plt.xlabel('Iterations $k$')
plt.ylabel('Result $x$')
plt.legend()
plt.tight_layout()
if False:
plt.savefig('../img/univar_loss_plot.png')
Find the root of 144 to the power of 2. Newton: 12.0 w/ 0.0 diverg. after 11 iterations Binary: 12.000000000021828 w/ 2.1827872842550278e-11 diverg. after 41 iterations