{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Plot a confidence ellipse of a two-dimensional dataset\n\nThis example shows how to plot a confidence ellipse of a\ntwo-dimensional dataset, using its pearson correlation coefficient.\n\nThe approach that is used to obtain the correct geometry is\nexplained and proved here:\n\nhttps://carstenschelp.github.io/2018/09/14/Plot_Confidence_Ellipse_001.html\n\nThe method avoids the use of an iterative eigen decomposition algorithm\nand makes use of the fact that a normalized covariance matrix (composed of\npearson correlation coefficients and ones) is particularly easy to handle.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\nimport numpy as np\n\nfrom matplotlib.patches import Ellipse\nimport matplotlib.transforms as transforms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The plotting function itself\n\nThis function plots the confidence ellipse of the covariance of the given\narray-like variables x and y. The ellipse is plotted into the given\nAxes object *ax*.\n\nThe radiuses of the ellipse can be controlled by n_std which is the number\nof standard deviations. The default value is 3 which makes the ellipse\nenclose 98.9% of the points if the data is normally distributed\nlike in these examples (3 standard deviations in 1-D contain 99.7%\nof the data, which is 98.9% of the data in 2-D).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):\n \"\"\"\n Create a plot of the covariance confidence ellipse of *x* and *y*.\n\n Parameters\n ----------\n x, y : array-like, shape (n, )\n Input data.\n\n ax : matplotlib.axes.Axes\n The Axes object to draw the ellipse into.\n\n n_std : float\n The number of standard deviations to determine the ellipse's radiuses.\n\n **kwargs\n Forwarded to `~matplotlib.patches.Ellipse`\n\n Returns\n -------\n matplotlib.patches.Ellipse\n \"\"\"\n if x.size != y.size:\n raise ValueError(\"x and y must be the same size\")\n\n cov = np.cov(x, y)\n pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])\n # Using a special case to obtain the eigenvalues of this\n # two-dimensional dataset.\n ell_radius_x = np.sqrt(1 + pearson)\n ell_radius_y = np.sqrt(1 - pearson)\n ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2,\n facecolor=facecolor, **kwargs)\n\n # Calculating the standard deviation of x from\n # the squareroot of the variance and multiplying\n # with the given number of standard deviations.\n scale_x = np.sqrt(cov[0, 0]) * n_std\n mean_x = np.mean(x)\n\n # calculating the standard deviation of y ...\n scale_y = np.sqrt(cov[1, 1]) * n_std\n mean_y = np.mean(y)\n\n transf = transforms.Affine2D() \\\n .rotate_deg(45) \\\n .scale(scale_x, scale_y) \\\n .translate(mean_x, mean_y)\n\n ellipse.set_transform(transf + ax.transData)\n return ax.add_patch(ellipse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A helper function to create a correlated dataset\n\nCreates a random two-dimensional dataset with the specified\ntwo-dimensional mean (mu) and dimensions (scale).\nThe correlation can be controlled by the param 'dependency',\na 2x2 matrix.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def get_correlated_dataset(n, dependency, mu, scale):\n latent = np.random.randn(n, 2)\n dependent = latent.dot(dependency)\n scaled = dependent * scale\n scaled_with_offset = scaled + mu\n # return x and y of the new, correlated dataset\n return scaled_with_offset[:, 0], scaled_with_offset[:, 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Positive, negative and weak correlation\n\nNote that the shape for the weak correlation (right) is an ellipse,\nnot a circle because x and y are differently scaled.\nHowever, the fact that x and y are uncorrelated is shown by\nthe axes of the ellipse being aligned with the x- and y-axis\nof the coordinate system.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "np.random.seed(0)\n\nPARAMETERS = {\n 'Positive correlation': [[0.85, 0.35],\n [0.15, -0.65]],\n 'Negative correlation': [[0.9, -0.4],\n [0.1, -0.6]],\n 'Weak correlation': [[1, 0],\n [0, 1]],\n}\n\nmu = 2, 4\nscale = 3, 5\n\nfig, axs = plt.subplots(1, 3, figsize=(9, 3))\nfor ax, (title, dependency) in zip(axs, PARAMETERS.items()):\n x, y = get_correlated_dataset(800, dependency, mu, scale)\n ax.scatter(x, y, s=0.5)\n\n ax.axvline(c='grey', lw=1)\n ax.axhline(c='grey', lw=1)\n\n confidence_ellipse(x, y, ax, edgecolor='red')\n\n ax.scatter(mu[0], mu[1], c='red', s=3)\n ax.set_title(title)\n\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Different number of standard deviations\n\nA plot with n_std = 3 (blue), 2 (purple) and 1 (red)\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, ax_nstd = plt.subplots(figsize=(6, 6))\n\ndependency_nstd = [[0.8, 0.75],\n [-0.2, 0.35]]\nmu = 0, 0\nscale = 8, 5\n\nax_nstd.axvline(c='grey', lw=1)\nax_nstd.axhline(c='grey', lw=1)\n\nx, y = get_correlated_dataset(500, dependency_nstd, mu, scale)\nax_nstd.scatter(x, y, s=0.5)\n\nconfidence_ellipse(x, y, ax_nstd, n_std=1,\n label=r'$1\\sigma$', edgecolor='firebrick')\nconfidence_ellipse(x, y, ax_nstd, n_std=2,\n label=r'$2\\sigma$', edgecolor='fuchsia', linestyle='--')\nconfidence_ellipse(x, y, ax_nstd, n_std=3,\n label=r'$3\\sigma$', edgecolor='blue', linestyle=':')\n\nax_nstd.scatter(mu[0], mu[1], c='red', s=3)\nax_nstd.set_title('Different standard deviations')\nax_nstd.legend()\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the keyword arguments\n\nUse the keyword arguments specified for `matplotlib.patches.Patch` in order\nto have the ellipse rendered in different ways.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, ax_kwargs = plt.subplots(figsize=(6, 6))\ndependency_kwargs = [[-0.8, 0.5],\n [-0.2, 0.5]]\nmu = 2, -3\nscale = 6, 5\n\nax_kwargs.axvline(c='grey', lw=1)\nax_kwargs.axhline(c='grey', lw=1)\n\nx, y = get_correlated_dataset(500, dependency_kwargs, mu, scale)\n# Plot the ellipse with zorder=0 in order to demonstrate\n# its transparency (caused by the use of alpha).\nconfidence_ellipse(x, y, ax_kwargs,\n alpha=0.5, facecolor='pink', edgecolor='purple', zorder=0)\n\nax_kwargs.scatter(x, y, s=0.5)\nax_kwargs.scatter(mu[0], mu[1], c='red', s=3)\nax_kwargs.set_title('Using keyword arguments')\n\nfig.subplots_adjust(hspace=0.25)\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ ".. tags::\n\n plot-type: speciality\n plot-type: scatter\n component: ellipse\n component: patch\n domain: statistics\n\n.. admonition:: References\n\n The use of the following functions, methods, classes and modules is shown\n in this example:\n\n - `matplotlib.transforms.Affine2D`\n - `matplotlib.patches.Ellipse`\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.2" } }, "nbformat": 4, "nbformat_minor": 0 }