Optimization generally involves the definition of a cost (or loss) function prior to the actual minimization to assess each intermediate result candidate $x_k, \, k \in \mathbb{N}$ by analyzing how much it deviates from the point of convergence. We employ the *squared loss* here as it is differentiable and given by

where $x$ represents the requested input which is constant.

In [1]:

```
L = lambda x_k, x, b: (x_k**b - x)**2
```

Dual differentation of the loss function is an important requirement for the Newton method. The first-order derivative can be given by

$$\frac{\partial}{\partial x_k} L(x_k)=2b\left(x_k^{b+1}-x_kx\right)$$of our cost function is obtained from the chain rule and written as

In [2]:

```
d1L = lambda x_k, x, b: 2*b*(x_k**(b+1) - x*x_k)
```

Similarly, we obtain the second-order derivative given by

$$\frac{\partial^2}{\partial^2 x_k} L(x_k)=2b\left((b+1)x_k^b-x\right)$$which is implemented as

In [3]:

```
d2L = lambda x_k, x, b: 2*b*((b+1)*x_k**b - x)
```

Using Newton's method, we aim to converge to $f'(x)=0$ which may be achieved by second-order Taylor expansion yielding

$$x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)}=x_k-\left(\frac{\partial}{\partial x_k} L(x_k)\right)\left(\frac{\partial^2}{\partial^2 x_k} L(x_k)\right)^{-1}$$which is implemented hereafter with a tolerance value and maximum iteration number as break conditions.

In [4]:

```
def root_newton(x, b, tol=10**-17, max_iter=2000):
x_k = x / 2 # starting guess
x_list = [x_k] # list collecting all candidates
while d1L(x_k, x, b) > tol and len(x_list) < max_iter:
x_k -= d1L(x_k, x, b)/d2L(x_k, x, b)
x_list.append(x_k)
return x_list
```

For sake of benchmarking, the binary search algorithm is used in the following.

In [5]:

```
def root_binary(x, b, tol=10**-16, max_iter=2000):
high = x / 2
low = 0
mean = (high + low) / 2
x_list = [mean]
while abs(mean**b-x) > tol and len(x_list) < max_iter:
mean = (high + low) / 2
if mean**b > x:
high = mean
else:
low = mean
x_list.append(mean)
return x_list
```

The requested input $x\in\mathbb{R}_+$ for root computation and respective exponent $b\in\mathbb{N}$ are set hereafter and can be varied to see their effects on the computational process. From the plot below, it can be observed that the Newton method generally converges to a more accurate result with fewer iterations whereas the binary search yields good first approximates oscillating around the convergence point $x$.

In [6]:

```
# input parameters
x = 144
b = 2
tol = 10**-9
print("Find the root of %s to the power of %s." % (x, b))
# run optimization
res_nwt = root_newton(x, b, tol)
res_bin = root_binary(x, b, tol)
# evaluation
truth = x**(1/b)
print("Newton: %s w/ %s diverg. after %d iterations" % (res_nwt[-1], res_nwt[-1] - truth, len(res_nwt)))
print("Binary: %s w/ %s diverg. after %d iterations" % (res_bin[-1], res_bin[-1] - truth, len(res_bin)))
import matplotlib.pyplot as plt
%matplotlib inline
plt.loglog(range(len(res_nwt)), res_nwt, label='Newton method')
plt.loglog(range(len(res_bin)), res_bin, label='Binary search')
plt.xlim([1, max(len(res_nwt), len(res_bin))])
plt.xlabel('Iterations $k$')
plt.ylabel('Result $x$')
plt.legend()
plt.tight_layout()
if False:
plt.savefig('../img/univar_loss_plot.png')
```