{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Solve the negative cosine problem\n", "\n", "## Introduction\n", "In this Notebook we want to minimize the function\n", "\\begin{align*}\n", "-\\cos : \\mathbb{R} &\\to \\mathbb{R}\\\\\n", "x &\\mapsto -\\cos(x)\n", "\\end{align*}\n", "to demonstrate the usage of the optimization package oppy. After importing all the dependencies and needed \n", "modules, we will take a quick look at the function itself and then start the minimization process." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import several costfunctions\n", "from oppy.tests.costfunctions import negativeCosine, gradient_negativeCosine, hessian_negativeCosine\n", "\n", "# load standard modules\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will plot the function and its derivatives:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(0, 2*np.pi, 200)\n", "y = negativeCosine(x)\n", "y_first_deriv = gradient_negativeCosine(x)\n", "y_second_deriv = np.cos(x)\n", "\n", "plt.plot(x,y, label=\"$-\\cos(x)$\")\n", "plt.plot(x,y_first_deriv, label=\"$(-\\cos(x))'$\")\n", "plt.plot(x,y_second_deriv, label=\"$(-\\cos(x))''$\")\n", "plt.legend(loc=\"upper right\")\n", "plt.title(\"Negative cosine and its derivatives\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Minimization Process\n", "\n", "### Armijo and wolfe linesearch with gradient descent method\n", "First, we need to import the linesearch algorithms, as well as the gradient descent method and the Options." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import linesearch methods\n", "from oppy.unconOpt import armijo, wolfe\n", "\n", "# import optimization methods\n", "from oppy.unconOpt import gradmethod\n", "\n", "# import the Options class\n", "from oppy.options import Options" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a first (and relatively simple) step, we want to minimize the negative cosine function without a lot of specified options. Afterwarts, we will manually change the options." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set the initial/starting point\n", "x0 = np.array([1.1655])\n", "# inital guess, feel free to try different guesses\n", "#x0 = np.array([1.1656])\n", "#x0 = np.array([1.9])\n", "#x0 = np.array([-math.atan(-math.pi)])\n", "\n", "# show intermediate steps, while the algorithm is executed\n", "disp = True\n", "# use='scale' is the default, so the descent direction is always the normalized gradient\n", "use = 'scale'\n", "\n", "options_grad = Options(disp=disp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can test our descent method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result_dict = {}\n", "\n", "result_dict['res_grad'] = gradmethod(negativeCosine, gradient_negativeCosine, x0, options_grad)\n", "\n", "# print results of the optimization process\n", "print(\"x_grad = {}\".format(result_dict[\"res_grad\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_grad\"].res[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We notice that the norm of the residual is already pretty small, but we want to get faster convergence results using custom options and more advanced optimization algorithms. We start with using armijo and wolfe linesearch methods with custom options. We must prepare the inital arguments for the optimization methods: $x_0, \\gamma, \\beta, \\max_{\\text{iter}}, \n", " c_1, c_2$ etc.\n", "We parameterize the functions with these options using the ``oppy.options.Options`` class. All of the oppy function take this class as a parameter to specify the options, but the structures and keywords can differ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# options for the linesearch methods, feel free to try differente settings\n", "initstep = 1\n", "gamma = 1e-2\n", "beta = 0.5\n", "nmax = 30\n", "c1 = 1e-4\n", "c2 = 0.4\n", "\n", "# define custom options for armijo linesearch \n", "options_armijo = Options(initstep=initstep, beta=beta, c1=c1, nmax=nmax)\n", "\n", "def linesearch_armijo(x, d):\n", " t, flag = armijo(negativeCosine, gradient_negativeCosine, x, d, options_armijo)\n", " return t, flag\n", "\n", "# define custom options for wolfe linesearch\n", "options_wolfe = Options(initstep=initstep, c1=c1, c2=c2, beta=beta, nmax=nmax)\n", "\n", "def linesearch_wolfe(x, d):\n", " t, flag = wolfe(negativeCosine, gradient_negativeCosine, x, d, options_wolfe)\n", " return t, flag" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a next step, we include these linesearch methods in our gradient descent method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# armijo linesearch\n", "\n", "options_grad_armijo = Options(disp=disp, linesearch=linesearch_armijo)\n", "\n", "result_dict['res_grad_armijo'] = gradmethod(negativeCosine, gradient_negativeCosine, x0,\n", " options_grad_armijo)\n", "\n", "# print results of the optimization process\n", "print('x_grad = {}'.format(result_dict['res_grad_armijo'].x))\n", "print('norm gradient = {}'.format(result_dict['res_grad_armijo'].res[-1]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# wolfe linesearch\n", "\n", "options_grad_wolfe = Options(disp=disp, linesearch=linesearch_wolfe)\n", "\n", "result_dict['res_grad_wolfe'] = gradmethod(negativeCosine, gradient_negativeCosine, x0,\n", " options_grad_wolfe)\n", "\n", "# print results of the optimization process\n", "print('x_grad = {}'.format(result_dict['res_grad_wolfe'].x))\n", "print('norm gradient = {}'.format(result_dict['res_grad_wolfe'].res[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both linesearch methods yield similar results, so we can only improve our optimization process by using\n", "a more advanced optimzation algorith: newton's method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Newton's method using armijo and wolfe linesearch\n", "We have to load again some modules. To compare the error history and thus the convergance rate, we will keep the main options as tolerance, maximum number of iterations, etc. the same." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from oppy.unconOpt import newton\n", "# define the options using armijo and wolfe linesearch\n", "options_newton_armijo = Options(disp=disp, linesearch=linesearch_armijo)\n", "options_newton_wolfe = Options(disp=disp, linesearch=linesearch_wolfe)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# armijo linesearch algorithm\n", "result_dict[\"res_newton_armijo\"] = newton(negativeCosine, gradient_negativeCosine, hessian_negativeCosine,\n", " x0, options_newton_armijo)\n", "print(\"x_newton_armijo = {}\".format(result_dict[\"res_newton_armijo\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_newton_armijo\"].res[-1]))\n", "# wolfe linesearch algorithm\n", "result_dict[\"res_newton_wolfe\"] = newton(negativeCosine, gradient_negativeCosine, hessian_negativeCosine,\n", " x0, options_newton_wolfe)\n", "print(\"\\nx_newton_wolfe = {}\".format(result_dict[\"res_newton_wolfe\"].x))\n", "print(\"norm gradient = {}\".format(result_dict[\"res_newton_wolfe\"].res[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can clearly see that newton's method converges faster and yields a more precise result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparison of the different algorithms\n", "In this step, we will extend our repertory of optmization methods by the Quasi-Newton-method and the nonlinear CG method. To do so, we need some more imports:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import optimization methods\n", "from oppy.unconOpt import nonlinear_cg, quasinewton\n", "\n", "# options for nonlinear cg\n", "options_nl_cg_wolfe = Options(disp=disp, linesearch=linesearch_wolfe)\n", "\n", "# options for quasinewton method\n", "options_qn_wolfe = Options(disp=disp, linesearch=linesearch_wolfe)\n", "\n", "result_dict['res_nonlinear_cg_wolfe'] = nonlinear_cg(negativeCosine, gradient_negativeCosine, x0,\n", " options_nl_cg_wolfe)\n", "\n", "result_dict['res_qn_wolfe'] = quasinewton(negativeCosine, gradient_negativeCosine, x0,\n", " options_qn_wolfe)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion: Comparison of convergence rates\n", "In our last step, we compare all of the used optimization methods which used wolfe linesearch. Therefore, we need to filter our ``result_dict``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result_dict_wolfe = {key:value for key, value in result_dict.items() if key.endswith('wolfe')}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plot the norm of the gradients in a semilogarithmic plot\n", "\n", "plt.figure(figsize=(15,10))\n", "for key in result_dict_wolfe:\n", " plt.semilogy(result_dict_wolfe[key].res, label = key)\n", "plt.legend(loc=\"upper right\")\n", "plt.title('Convergence results for different methods')\n", "plt.xlabel('Number of iterations')\n", "plt.ylabel(\"Norm of the gradient\")\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "483.094px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }