{ "cells": [ { "cell_type": "markdown", "id": "f49fcc79", "metadata": {}, "source": [ "# Adjusting Parameters in a Nonlinear Model Using the Levenberg-Marquardt Method" ] }, { "cell_type": "markdown", "id": "f84eb9c8", "metadata": {}, "source": [ "## 1. Introduction" ] }, { "cell_type": "markdown", "id": "5c47ca83", "metadata": {}, "source": [ "The Levenberg-Marquardt method, which is a part of the nonlinear_least_square( ) function can be used to adjust parameters of a nonlinear model such that it fits given data.\n", "\n", "More detailed the task of adjusting parameters can be reformulated to solving a nonlinear least-squares problem: \n", "\\begin{align*}\n", " \\min_\\limits{x \\in \\mathbb{R}^n} f(x)= \\frac{1}{2} \\sum_{j=1}^{m} r_j^2(x),\n", "\\end{align*}\n", "where each $r_j: \\mathbb{R}^n \\rightarrow \\mathbb{R}$ stands for a residual. In our situation we further assume that every $r_j$ is a smooth function and that $m \\geq n$ holds, which means that there are more data points than adjustable parameters.\n", "\n", "We first look at an example, which consists of a nonlinear model as well as respective data. With this, we show how to derive the nonlinear least-squares problem and especially the objective function $f$ which is to be minimized. Afterwards we solve the nonlinear least-squares problem by using the Levenberg-Marquardt method and obtain the adjusted model parameters.\n", "\n", "Finally, potential problems of the Levenberg-Marquardt which might occur due to poor scaling and large residuals of the problem are discussed. Here we look at another example and modify the optional input parameters of the Levenberg-Marquardt method." ] }, { "cell_type": "markdown", "id": "19f7b89f", "metadata": {}, "source": [ "## 2. Example" ] }, { "cell_type": "markdown", "id": "66146f4a", "metadata": {}, "source": [ "An example concerning \"Pasture regrowth\" presented in the book \"Statistical Tools for Nonlinear Regression\" by Huet et al. (2004) can be used to demonstrate the process. In the example we want to model the yield $y$ as a nonlinear function of the time $t$ since the last grazing. The goal is to adjust the parameters of the nonlinear function such that it fits the data." ] }, { "cell_type": "markdown", "id": "bd69c1d7", "metadata": {}, "source": [ "### Data:" ] }, { "cell_type": "markdown", "id": "cdce3118", "metadata": {}, "source": [ "The data points $(t_j,y_j)$ are given in the following table:" ] }, { "cell_type": "markdown", "id": "7cdb0f98", "metadata": {}, "source": [ "| Time after grazing $t$ | 9 | 14 | 21 | 28 | 42 | 57 | 63 | 70 | 79 |\n", "| ---------------------- |---| ---| ---| ---|--- |--- |--- |--- |--- |\n", "| Yield $y$ | 8.93 |10.8 | 18.59 | 22.33 | 39.35 | 56.11 | 61.73 | 64.92 | 67.08 | " ] }, { "cell_type": "markdown", "id": "5bfd92ec", "metadata": {}, "source": [ "To illustrate the problem, we can visualize the data in a scatterplot:" ] }, { "cell_type": "code", "execution_count": null, "id": "5ae23af1", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "time = [9, 14, 21, 28, 42, 57, 63, 70, 79]\n", "yield_pasture = [8.93, 10.8, 18.59, 22.33, 39.35, 56.11, 61.73, 64.92, 67.08]\n", "\n", "fig=plt.figure(figsize=(9, 9))\n", "plt.xlabel(\"time $t$ after grazing\")\n", "plt.ylabel(\"yield $y$\")\n", "plt.title(\"Example data\")\n", "# plot the data set\n", "plt.plot(time, yield_pasture, \"b+\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "eec7b322", "metadata": {}, "source": [ "### Model:" ] }, { "cell_type": "markdown", "id": "fc983152", "metadata": {}, "source": [ "In our example the following nonlinear model function is chosen as a model:" ] }, { "cell_type": "markdown", "id": "c0b2b3f5", "metadata": {}, "source": [ "$$\\phi(x,t) = x_1-x_2\\exp[-\\exp(x_3+x_4 \\log(t_j))].$$" ] }, { "cell_type": "markdown", "id": "ed852052", "metadata": {}, "source": [ "In summary, the goal is now to adjust the four parameters $x_1$, $x_2$, $x_3$, $x_4$. In a first step we can formulate the corresponding nonlinear least-squares problem and afterwards we use the Levenberg-Marquardt method to solve it." ] }, { "cell_type": "markdown", "id": "11d5fb51", "metadata": {}, "source": [ "### Nonlinear Least-Squares Problem:" ] }, { "cell_type": "markdown", "id": "dd4576e7", "metadata": {}, "source": [ "The error between a predicted model value and its respective data point is usually called the residual. In our case we get the following residuals:\n", "\n", "$$ r_j(x) = x_1-x_2\\exp[-\\exp(x_3+x_4 \\log(t_j))]-y_j. $$" ] }, { "cell_type": "markdown", "id": "bba1feb6", "metadata": {}, "source": [ "In order that the model function fits best the data, the sum of the squared residuals must be minimized. Thus, we already obtain our objective function $f$:\n", "\\begin{align*}\n", " f(x)= \\frac{1}{2} \\sum_{j=1}^{m} r_j^2(x),\n", "\\end{align*}\n", "which is to be minimized." ] }, { "cell_type": "markdown", "id": "917b6329", "metadata": {}, "source": [ "## 3. Minimize the objective function $f$" ] }, { "cell_type": "markdown", "id": "fc2d6520", "metadata": {}, "source": [ "The Levenberg-Marquardt method is an algorithm to solve our minimization problem. To apply it to our example, we need as inputs the **residual vector** and the **Jacobian matrix** of the residual vector. We will define them both in the first part of this section. \n", "Besides, a **starting point** must be chosen." ] }, { "cell_type": "markdown", "id": "ac2ca314", "metadata": {}, "source": [ "The residual vector contains all residuals of our problem\n", "$r(x) = (r_1(x), \\ldots, r_m(x))^T$. Consequently, we can calculate the residual vector directly with the model function and the given data points. \n", "The following code snippet gives us the residual vector for our example task." ] }, { "cell_type": "code", "execution_count": null, "id": "ae9feb93", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def model(x, t):\n", " return x[0]-x[1]*np.exp(-np.exp(x[2]+x[3]*np.log(t)))\n", "\n", "def residual_vector(x): \n", " residuals = np.array([])\n", " for i in range(len(time)):\n", " residuals = np.append(residuals, model(x, time[i]) - yield_pasture[i])\n", " return residuals " ] }, { "cell_type": "markdown", "id": "ed0c6bd7", "metadata": {}, "source": [ "A key ingredient to the Levenberg-Marquardt method is that it is using a special approximation of the Hessian of $f(x)$, which is based on the Jacobian matrix of the residual vector $r(x)$. This is the reason why we also have to give the Jacobian as an input to the method.\n", "\n", "The Jacobian of the residual vector $r(x)$ can be derived by first calculating the gradient of each residual and finally use this to build the Jacobian of the residual vector. " ] }, { "cell_type": "code", "execution_count": null, "id": "df2f498c", "metadata": {}, "outputs": [], "source": [ "def gradient(x, t):\n", " derivation = np.array([1,-np.exp(-np.exp(x[2]+x[3]*np.log(t))),\n", " -x[1]*np.exp(-np.exp(x[2]+x[3]*np.log(t)))*(-np.exp(x[2]+x[3]*np.log(t))),\n", " -x[1]*np.log(t)*np.exp(-np.exp(x[2]+x[3]*np.log(t)))*(-np.exp(x[2]+x[3]*np.log(t)))])\n", " return derivation\n", "\n", "def jacobian(x):\n", " jacobian_matrix = gradient(x, time[0])[np.newaxis, :]\n", " for i in np.arange(1, len(yield_pasture)):\n", " jacobian_matrix = np.concatenate((jacobian_matrix, gradient(x, time[i])[np.newaxis, :]), axis=0)\n", " return jacobian_matrix " ] }, { "cell_type": "markdown", "id": "55f57b31", "metadata": {}, "source": [ "Before we can apply the Levenberg-Marquardt method, we have to choose a starting point.\n", "In general the starting point should be chosen as close to the minimum of the objective function $f$. Of course we do not know the minimum so the best guess can be used.\n", "\n", "For the example we could start the iteration at:" ] }, { "cell_type": "code", "execution_count": null, "id": "0912c4d5", "metadata": {}, "outputs": [], "source": [ "x0 = np.array([80,70,-7,0.1])" ] }, { "cell_type": "markdown", "id": "1c2e3a6c", "metadata": {}, "source": [ "With this we have everything together to apply the Levenberg-Marquardt method.\n", "The Levenberg-Marquardt method can be found in the nonlinear_least_square function and can be imported by the following code:" ] }, { "cell_type": "code", "execution_count": null, "id": "8d66ea1a", "metadata": {}, "outputs": [], "source": [ "from oppy.leastSquares import nonlinear_least_square" ] }, { "cell_type": "markdown", "id": "360a86ce", "metadata": {}, "source": [ "And now it is time to start the Levenberg-Marquardt method and print the results:" ] }, { "cell_type": "code", "execution_count": null, "id": "05e82f50", "metadata": {}, "outputs": [], "source": [ "result= nonlinear_least_square(residual_vector, jacobian, x0)\n", "# print the results\n", "print(\"parameter vector x = {}\".format(result.x))\n", "print(\"norm of the gradient of f at x = {}\".format(result.res[-1]))" ] }, { "cell_type": "markdown", "id": "49f92bc8", "metadata": {}, "source": [ "Additionaly in the default settings, \"result\" contains information about the iteration process, such as the number of iterations, the norm of the gradient of f at each iterate and the final iterate:" ] }, { "cell_type": "code", "execution_count": null, "id": "edeecd70", "metadata": {}, "outputs": [], "source": [ "print(result)" ] }, { "cell_type": "markdown", "id": "42be6a4e", "metadata": {}, "source": [ "## 4. Graphical Illustration of the Solution" ] }, { "cell_type": "markdown", "id": "cb9853b0", "metadata": {}, "source": [ "In our example it is useful to illustrate the solution by adding the nonlinear model function with its adjusted parameters into the scatterplot. This is the goal of the next code snippet." ] }, { "cell_type": "code", "execution_count": null, "id": "0028599b", "metadata": {}, "outputs": [], "source": [ "t = np.linspace(7, 80, num=10000)\n", "y = model(result.x,t) #solution[\"information\"].x is our parameter vector from above\n", "\n", "fig=plt.figure(figsize=(9, 9))\n", "plt.xlabel(\"time after pasture $t$\")\n", "plt.ylabel(\"yield $y$\")\n", "plt.title(\"Example data\")\n", "plt.plot(time, yield_pasture, \"b+\", label=\"example data\")\n", "plt.plot(t,y, \"r-\", label=\"nonlinear model\")\n", "plt.legend(loc=\"upper right\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "5714c7a1", "metadata": {}, "source": [ "## 5. Potential Issues" ] }, { "cell_type": "markdown", "id": "ca55ee33", "metadata": {}, "source": [ "If the Levenberg-Marquardt method has problems or fails to converge, it is usually due to **poor scaling** properties of the problem or that it is a **large-residual** problem. Poor scaling means, that a change of $x$ in a given direction leads to a large variation in the model function, whereas a change in another direction leads to only a small variation. The Levenberg-Marquardt method contains a **scaling option** which should be used in this situation.\n", "When large residuals occur around the minimizer of the objective function, the underlying approximation in the algorithm is problematic which results in a slow convergence." ] }, { "cell_type": "markdown", "id": "ac9d91e7", "metadata": {}, "source": [ "### Poor Scaling - Example" ] }, { "cell_type": "markdown", "id": "24b982b9", "metadata": {}, "source": [ "When poor scaling occurs, the scaling option of the Levenberg-Marquardt method can be set to true. Consequently the algorithm reflects (approximately) the scaling properties of the underlying problem.\n", "Therefore we take a look at the example about the Feulgen hydrolysis kinetics, given in the book \"Numerische Mathematik 1: eine algorithmisch orientierte Einführung\" by Peter Deuflhard and Andreas Hohmann (2019)." ] }, { "cell_type": "markdown", "id": "a05981ae", "metadata": {}, "source": [ "**Data and Model function:**" ] }, { "cell_type": "markdown", "id": "34b65d80", "metadata": {}, "source": [ "The residuals are given by:\n", "\n", "$ r_j(x) = x_1 \\exp(-(x_2^2+x_3^2)t_j) \\frac{\\sinh((x_3^2)t_j)}{x_3^2}-y_j$,\n", "\n", "in the example. The data $t_j$ and $y_j$ are given in the following code:" ] }, { "cell_type": "code", "execution_count": null, "id": "475af358", "metadata": {}, "outputs": [], "source": [ "# Data\n", "# at each time point t_j one value y_j is measured\n", "time = [] \n", "for i in range(1, 31):\n", " time.append(6*i)\n", "\n", "measured = [24.19, 35.34, 43.43, 42.63, 49.92, 51.53,\n", " 57.39, 59.56, 55.60, 51.91, 58.27, 62.99,\n", " 52.99, 53.83, 59.37, 62.35, 61.84, 61.62,\n", " 49.64, 57.81, 54.79, 50.38, 43.85, 45.16,\n", " 46.72, 40.68, 35.14, 45.47, 42.40, 55.21] #y\n", "\n", "# Model function\n", "def model(x, t):\n", " return x[0]*np.exp(-(x[1]**2+x[2]**2)*t)*np.sinh((x[2]**2)*t)/(x[2]**2)" ] }, { "cell_type": "markdown", "id": "a710adbf", "metadata": {}, "source": [ "In a next step we can define the residual vector and its Jacobian matrix." ] }, { "cell_type": "code", "execution_count": null, "id": "db2f7acd", "metadata": {}, "outputs": [], "source": [ "# Define gradient of the residual\n", "def gradient(x, t):\n", " derivative = np.array([np.exp(-(x[1] ** 2 + x[2] ** 2) * t) * np.sinh((x[2] ** 2) * t) / (x[2] ** 2),\n", " -2 * x[0] * x[1] * t * np.exp(-(x[1] ** 2 + x[2] ** 2) * t) * np.sinh((x[2] ** 2) * t) / (x[2] ** 2),\n", " 2 * x[0] * np.exp(-(x[1] ** 2 + x[2] ** 2) * t) * (((x[2] ** 2) * t)\n", " * np.cosh((x[2] ** 2) * t) - (x[2]**2 * t + 1) * np.sinh((x[2] ** 2) * t)) / (x[2] ** 3)])\n", "\n", " return derivative\n", "\n", "# import functions to compute the residual vector and its Jacobian easily\n", "from oppy.tests.costfunctions import (data_least_square,\n", " gradient_data_least_square)\n", "# calculate the residual\n", "def residual_vector(x):\n", " return data_least_square(x, model, time, measured)\n", "# calculate the Jacobian\n", "def jacobian(x):\n", " return gradient_data_least_square(x, gradient, time, measured)" ] }, { "cell_type": "markdown", "id": "ea4652e5", "metadata": {}, "source": [ "The problem seems to be poorly scaled, a change in the direction $x_2$ has a larger effect than a change in the direction $x_1$ in the model function and thus in the respective objective function. Poor scaling makes the Levenberg-Marquardt method slow, as it results in only small steps. When we start with a starting value close to the minimizer the problem might not be problematic as the reduced step sizes might still be adequate." ] }, { "cell_type": "code", "execution_count": null, "id": "05f6e107", "metadata": {}, "outputs": [], "source": [ "# easy starting point, close to the minimizer\n", "x0 = np.array([8, 0.055, 0.21])\n", "result = nonlinear_least_square(residual_vector, jacobian, x0)\n", "print(result)" ] }, { "cell_type": "markdown", "id": "1e002ed6", "metadata": {}, "source": [ "However, if we start the iteration further away from the minimizer the Levenberg-Marquardt method fails to converge in our example." ] }, { "cell_type": "code", "execution_count": null, "id": "6c18ae45", "metadata": {}, "outputs": [], "source": [ "# difficult starting point, further away from the minimizer\n", "x0 = 5 * np.array([8, 0.055, 0.21])\n", "result = nonlinear_least_square(residual_vector, jacobian, x0) # fails to converge\n", "# after 500 steps the minimizer is not found\n", "print(\"last iterate x = {}\".format(result.x))\n", "print(\"norm of the gradient of f at x = {}\".format(result.res[-1]))" ] }, { "cell_type": "markdown", "id": "91a42c79", "metadata": {}, "source": [ "Note that the two warning messages above are because the residual function is not valid on the whole domain, as the value gets too large which results in an overflow. However, this does not affect the iteration, as the Levenberg-Marquardt method is able to choose a different step in this case. " ] }, { "cell_type": "markdown", "id": "813f7a5f", "metadata": {}, "source": [ "The key to solving this problem is to use the scaling option, as it is demonstrated in the next code snippet:" ] }, { "cell_type": "code", "execution_count": null, "id": "00810f67", "metadata": {}, "outputs": [], "source": [ "# import Options\n", "from oppy.options import Options\n", "\n", "# set the options\n", "options = Options(scaling = True)\n", "\n", "# apply the Levenberg-Marquardt method with the scaling option\n", "result = nonlinear_least_square(residual_vector, jacobian, x0, options)\n", "print(result)" ] }, { "cell_type": "markdown", "id": "9fb08b6f", "metadata": {}, "source": [ "To sum up, when poor scaling properties are given in the underlying problem, the Levenberg-Marquardt method cannot make large enough steps and might fail to converge within 500 steps. Nevertheless, the option **\"scaling\"** should be used, such that the scaling properties are included and the algorithm still converges." ] }, { "cell_type": "markdown", "id": "6ddf0554", "metadata": {}, "source": [ "### Large Residuals - Example" ] }, { "cell_type": "markdown", "id": "b0a0880d", "metadata": {}, "source": [ "In general large residuals imply that an underlying approximation which is used to find a new step in the Levenberg-Marquardt method is inappropriate. Consequently we cannot expect a fast convergence rate and the algorithm might even fail to converge. Unfortunately, there is no option available to improve the algorithm in this case. If the Levenberg-Marquardt method fails to converge, a good idea would be to choose a Newton or quasi-Newton method as they can be found in the unconstrained algorithms within the *oppy* package." ] }, { "cell_type": "markdown", "id": "0504c78c", "metadata": {}, "source": [ "In the following example, the large-residual problem created by Brown and Dennis (1971) in \"New computational algorithms for minimizing a sum of squares of nonlinear functions\" is used. As it can be seen, the Levenberg-Marquardt method still converges to the minimizer, so the application of Newton or quasi-Newton methods is not necessary.\n", "\n", "The residuals are given by:\n", "\n", "$$ r_j(x)= [x_1+x_2t_j- \\exp(t_j)]^2+[x_3+x_4\\sin(t_j)- \\cos(t_j)]^2, $$\n", "\n", "where $t_j = 0.2j$ and $j \\in \\{1,\\ldots,20\\}$.\n", "\n", "The following code examples apply the Levenberg-Marquardt method to this problem." ] }, { "cell_type": "code", "execution_count": null, "id": "b29f368b", "metadata": {}, "outputs": [], "source": [ "t = []\n", "for i in range(1,21):\n", " t.append(i*0.2)\n", " \n", "def residual_vector(x): # define residuals\n", " residuals = np.array([])\n", " for i in range(len(t)):\n", " residuals = np.append(residuals, ((x[0]+x[1]*t[i]-np.exp(t[i]))**2+(x[2]+x[3]*np.sin(t[i])-np.cos(t[i]))**2))\n", " return residuals \n", "\n", "\n", "def gradient(x, t):\n", " derivation = np.array([2*(x[0]+x[1]*t-np.exp(t)),\n", " 2*t*(x[0]+x[1]*t-np.exp(t)),\n", " 2*(x[2]+x[3]*np.sin(t)-np.cos(t)),\n", " 2*np.sin(t)*(x[2]+x[3]*np.sin(t)-np.cos(t))])\n", " return derivation\n", " \n", "def jacobian(x):\n", " jacobian_matrix = gradient(x, t[0])[np.newaxis, :]\n", " for i in np.arange(1, len(t)):\n", " jacobian_matrix = np.concatenate((jacobian_matrix, gradient(x, t[i])[np.newaxis, :]), axis=0)\n", " return jacobian_matrix " ] }, { "cell_type": "markdown", "id": "3cd3d15f", "metadata": {}, "source": [ "Now it is time to apply the Levenberg-Marquardt method, as a start iterate we choose this time:" ] }, { "cell_type": "code", "execution_count": null, "id": "ab28e641", "metadata": {}, "outputs": [], "source": [ "x0 = np.array([25,5,-5,1])" ] }, { "cell_type": "code", "execution_count": null, "id": "e048a001", "metadata": {}, "outputs": [], "source": [ "results = nonlinear_least_square(residual_vector, jacobian, x0)\n", "print(results)" ] }, { "cell_type": "markdown", "id": "0fc3a3f3", "metadata": {}, "source": [ "Here the Levenberg-Marquardt method worked fine. However, this might not always be the case. If the algorithm would not have converged, the next option would be to use a Newton or quasi-Newton method. For more information please see the tutorials about unconstrained optimization. To demonstrate the process, we will apply a quasi-Newton method." ] }, { "cell_type": "code", "execution_count": null, "id": "cd195e2c", "metadata": {}, "outputs": [], "source": [ "from oppy.unconOpt.quasinewton import quasinewton\n", "\n", "# define objective function and its gradient\n", "def f_objective(y):\n", " return 1/2*np.linalg.norm(residual_vector(y))**2\n", "def df_objective(y):\n", " return np.dot(np.transpose(jacobian(y)), residual_vector(y))\n", "\n", "# apply the quasi-Newton function\n", "result_quasinewton = quasinewton(f_objective, df_objective, x0)\n", "print(result_quasinewton)" ] }, { "cell_type": "markdown", "id": "e04bba02", "metadata": {}, "source": [ "Here we can see that the quasi-Newton method was faster than the Levenberg-Marquardt in order to solve the large-residual problem." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 5 }