{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "\"Open " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interpretation of linear regression coefficients.\n", "\n", "The module `statsmodels` gives a particularly convenient R-like formula approach to fitting linear models.\n", "It allows for a model specification of the form `outcome ~ predictors`. We give an example below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": {}, "colab_type": "code", "id": "GMMLqAkYRxb5" }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import sklearn.linear_model as lm\n", "import sklearn as skl\n", "import statsmodels.formula.api as smf\n", "\n", "## this sets some style parameters\n", "sns.set()\n", "\n", "## Read in the data and display a few rows\n", "dat = pd.read_csv(\"https://raw.githubusercontent.com/bcaffo/ds4bme_intro/master/data/oasis.csv\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 404 }, "colab_type": "code", "id": "fDJOnSsxe2N5", "outputId": "07acb08e-572d-4318-85b2-b59099335efb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Results: Ordinary least squares\n", "=================================================================\n", "Model: OLS Adj. R-squared: 0.743 \n", "Dependent Variable: PD AIC: 95.4183 \n", "Date: 2024-01-29 06:48 BIC: 116.2597\n", "No. Observations: 100 Log-Likelihood: -39.709 \n", "Df Model: 7 F-statistic: 41.98 \n", "Df Residuals: 92 Prob (F-statistic): 5.56e-26\n", "R-squared: 0.762 Scale: 0.14081 \n", "------------------------------------------------------------------\n", " Coef. Std.Err. t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------\n", "Intercept 0.2349 0.1231 1.9086 0.0594 -0.0095 0.4794\n", "FLAIR -0.0160 0.0761 -0.2107 0.8336 -0.1671 0.1351\n", "T1 -0.2116 0.0777 -2.7251 0.0077 -0.3659 -0.0574\n", "T2 0.6078 0.0747 8.1323 0.0000 0.4593 0.7562\n", "FLAIR_10 -0.2581 0.3078 -0.8386 0.4039 -0.8693 0.3532\n", "T1_10 0.2212 0.1494 1.4810 0.1420 -0.0755 0.5179\n", "T2_10 0.1103 0.2642 0.4177 0.6771 -0.4143 0.6350\n", "FLAIR_20 1.8072 0.6423 2.8136 0.0060 0.5315 3.0828\n", "-----------------------------------------------------------------\n", "Omnibus: 2.142 Durbin-Watson: 2.187\n", "Prob(Omnibus): 0.343 Jarque-Bera (JB): 1.725\n", "Skew: -0.075 Prob(JB): 0.422\n", "Kurtosis: 3.626 Condition No.: 40 \n", "=================================================================\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the\n", "errors is correctly specified.\n" ] } ], "source": [ "results = smf.ols('PD ~ FLAIR + T1 + T2 + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit()\n", "print(results.summary2())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The interpretation of the FLAIR coefficient is as follows. We estimate an expected 0.0160 decrease in proton density per 1 unit change in FLAIR - *with all of the remaining model terms held constant*. The latter statements is important to remember. That is, it's improtant to remember that coefficients are adjusted for the linear associations with other variables. One way to think about this is that both the PD and FLAIR variables have had the linear association with the other variables removed before relating them. The same is true for the other variables. The coefficient for T1 is interpreted similarly, it's the relationship between PD and T1 where the linear associations with the other variables had been removed from them both. Let's show this for the FLAIR variable." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Results: Ordinary least squares\n", "================================================================\n", "Model: OLS Adj. R-squared: -0.010 \n", "Dependent Variable: PD_adjusted AIC: 83.4183\n", "Date: 2024-01-29 06:48 BIC: 88.6286\n", "No. Observations: 100 Log-Likelihood: -39.709\n", "Df Model: 1 F-statistic: 0.04730\n", "Df Residuals: 98 Prob (F-statistic): 0.828 \n", "R-squared: 0.000 Scale: 0.13219\n", "----------------------------------------------------------------\n", " Coef. Std.Err. t P>|t| [0.025 0.975]\n", "----------------------------------------------------------------\n", "Intercept -0.0000 0.0364 -0.0000 1.0000 -0.0722 0.0722\n", "FLAIR_adjusted -0.0160 0.0737 -0.2175 0.8283 -0.1623 0.1303\n", "----------------------------------------------------------------\n", "Omnibus: 2.142 Durbin-Watson: 2.187\n", "Prob(Omnibus): 0.343 Jarque-Bera (JB): 1.725\n", "Skew: -0.075 Prob(JB): 0.422\n", "Kurtosis: 3.626 Condition No.: 2 \n", "================================================================\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the\n", "errors is correctly specified.\n" ] } ], "source": [ "\n", "# Model for PD with FLAIR removed\n", "dat['PD_adjusted'] = smf.ols('PD ~ T1 + T2 + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit().resid\n", "# Model for FLAIR \n", "dat['FLAIR_adjusted'] = smf.ols('FLAIR ~ T1 + T2 + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit().resid\n", "\n", "\n", "out = smf.ols('PD_adjusted ~ FLAIR_adjusted', data = dat).fit()\n", "print(out.summary2())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the coefficient is exactly the same (-0.0160). This highlights how linear regression \"adjusts\" for the other variables. It removes the linear association with them from both the explantory and outcome variables." ] } ], "metadata": { "colab": { "include_colab_link": true, "name": "notebook5.a.ipynb", "provenance": [] }, "interpreter": { "hash": "625a8f875bfb3f569e4f618df17e1f8389970b6b26ee2c84acb92c5fbebf95c3" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 4 }