Newton’s Method Optimization#
Function in One Variable#
You will use Newton’s method to optimize a function \(f\left(x\right)\). Aiming to find a point, where the derivative equals to zero, you need to start from some initial point \(x_0\), calculate first and second derivatives (\(f'(x_0)\) and \(f''(x_0)\)) and step to the next point using the expression:
Repeat the process iteratively. Number of iterations \(n\) is usually also a parameter.
Let’s optimize function \(f\left(x\right)=e^x - \log(x)\) (defined for \(x>0\)) using Newton’s method. To implement it in the code, define function \(f\left(x\right)=e^x - \log(x)\), its first and second derivatives:
import numpy as np
import matplotlib.pyplot as plt
import jax, jax.numpy as jnp
def f_example_1(x):
return jnp.exp(x) - jnp.log(x)
dfdx_example_1 = jax.grad(f_example_1)
d2fdx2_example_1 = jax.grad(dfdx_example_1)
x_0 = 1.6
print(f"f({x_0}) = {f_example_1(x_0)}")
print(f"f'({x_0}) = {dfdx_example_1(x_0)}")
print(f"f''({x_0}) = {d2fdx2_example_1(x_0)}")
Implement Newton’s method described above.
import numpy as np
import matplotlib.pyplot as plt
import jax, jax.numpy as jnp
def newton_method(dfdx, d2fdx2, x, num_iterations=100):
for iteration in range(num_iterations):
x = x - dfdx(x) /d2fdx2(x)
print(x)
return x
num_iterations_example_1 = 25; x_initial = 1.6
newtons_example_1 = newton_method(dfdx_example_1, d2fdx2_example_1, x_initial, num_iterations_example_1)
print("Newton's method result: x_min =", newtons_example_1)
Let’s compare with Gradient Descent method.
import numpy as np
import matplotlib.pyplot as plt
import jax, jax.numpy as jnp
def gradient_descent(dfdx, x, learning_rate=0.1, num_iterations=100):
for iteration in range(num_iterations):
x = x - learning_rate * dfdx(x)
print(x)
return x
num_iterations = 35; learning_rate = 0.25; x_initial = 1.6
gd_example_1 = gradient_descent(dfdx_example_1, x_initial, learning_rate, num_iterations)
print("Gradient descent result: x_min =", gd_example_1)
Function in Two Variables#
In case of a function in two variables, Newton’s method will require even more computations. Starting from the intial point \((x_0, y_0)\), the step to the next point shoud be done using the expression:
where \(H^{-1}\left(x_0, y_0\right)\) is an inverse of a Hessian matrix at point \((x_0, y_0)\) and \(\nabla f\left(x_0, y_0\right)\) is the gradient at that point.
Let’s implement that in the code. Define the function \(f(x, y)\) like in the videos, its gradient and Hessian:
import jax
import jax.numpy as jnp
def f_example_2(xy):
x, y = xy[0], xy[1]
return x**4 + 0.8*y**4 + 4*x**2 + 2*y**2 - x*y - 0.2*x**2*y
grad_f_example_2 = jax.grad(f_example_2)
hessian_f_example_2 = jax.hessian(f_example_2)
# Example usage:
xy_0 = jnp.array([4.0, 4.0])
print("f(4,4) =", f_example_2(xy_0))
print("grad f(4,4) =", grad_f_example_2(xy_0))
print("Hessian f(4,4) =", hessian_f_example_2(xy_0))
def newton_method_2(hessian_f, grad_f, x_y, num_iterations=100):
for iteration in range(num_iterations):
x_y = x_y - np.linalg.inv(hessian_f(x_y)) @ grad_f(x_y)
print(x_y.T)
return x_y
num_iterations_example_2 = 25
x_y_initial = jnp.array([4.0, 4.0])
newtons_example_2 = newton_method_2(hessian_f_example_2, grad_f_example_2,
x_y_initial, num_iterations=num_iterations_example_2)
print("Newton's method result: x_min, y_min =", newtons_example_2)
Compare with Gradient Descent method.
def gradient_descent_2(grad_f, x_y, learning_rate=0.1, num_iterations=100):
for iteration in range(num_iterations):
x_y = x_y - learning_rate * grad_f(x_y)
print(x_y.T)
return x_y
num_iterations_2 = 300; learning_rate_2 = 0.01; x_y_initial = np.array([4., 4.])
# num_iterations_2 = 300; learning_rate_2 = 0.03; x_y_initial = np.array([[4], [4]])
gd_example_2 = gradient_descent_2(grad_f_example_2, x_y_initial, learning_rate_2, num_iterations_2)
print("Gradient descent result: x_min, y_min =", gd_example_2)