Interpretation of linear regression coefficients.

Open In Colab

Interpretation of linear regression coefficients.#

The module statsmodels gives a particularly convenient R-like formula approach to fitting linear models. It allows for a model specification of the form outcome ~ predictors. We give an example below.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn.linear_model as lm
import sklearn as skl
import statsmodels.formula.api as smf

## this sets some style parameters
sns.set()

## Read in the data and display a few rows
dat = pd.read_csv("https://raw.githubusercontent.com/bcaffo/ds4bme_intro/master/data/oasis.csv")
results = smf.ols('PD ~ FLAIR + T1 + T2  + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit()
print(results.summary2())
                 Results: Ordinary least squares
=================================================================
Model:              OLS              Adj. R-squared:     0.743   
Dependent Variable: PD               AIC:                95.4183 
Date:               2024-01-29 06:48 BIC:                116.2597
No. Observations:   100              Log-Likelihood:     -39.709 
Df Model:           7                F-statistic:        41.98   
Df Residuals:       92               Prob (F-statistic): 5.56e-26
R-squared:          0.762            Scale:              0.14081 
------------------------------------------------------------------
               Coef.   Std.Err.     t     P>|t|    [0.025   0.975]
------------------------------------------------------------------
Intercept      0.2349    0.1231   1.9086  0.0594  -0.0095   0.4794
FLAIR         -0.0160    0.0761  -0.2107  0.8336  -0.1671   0.1351
T1            -0.2116    0.0777  -2.7251  0.0077  -0.3659  -0.0574
T2             0.6078    0.0747   8.1323  0.0000   0.4593   0.7562
FLAIR_10      -0.2581    0.3078  -0.8386  0.4039  -0.8693   0.3532
T1_10          0.2212    0.1494   1.4810  0.1420  -0.0755   0.5179
T2_10          0.1103    0.2642   0.4177  0.6771  -0.4143   0.6350
FLAIR_20       1.8072    0.6423   2.8136  0.0060   0.5315   3.0828
-----------------------------------------------------------------
Omnibus:               2.142        Durbin-Watson:          2.187
Prob(Omnibus):         0.343        Jarque-Bera (JB):       1.725
Skew:                  -0.075       Prob(JB):               0.422
Kurtosis:              3.626        Condition No.:          40   
=================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the
errors is correctly specified.

The interpretation of the FLAIR coefficient is as follows. We estimate an expected 0.0160 decrease in proton density per 1 unit change in FLAIR - with all of the remaining model terms held constant. The latter statements is important to remember. That is, it’s improtant to remember that coefficients are adjusted for the linear associations with other variables. One way to think about this is that both the PD and FLAIR variables have had the linear association with the other variables removed before relating them. The same is true for the other variables. The coefficient for T1 is interpreted similarly, it’s the relationship between PD and T1 where the linear associations with the other variables had been removed from them both. Let’s show this for the FLAIR variable.

# Model for PD with FLAIR removed
dat['PD_adjusted'] = smf.ols('PD ~ T1 + T2  + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit().resid
# Model for FLAIR 
dat['FLAIR_adjusted'] = smf.ols('FLAIR ~ T1 + T2  + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit().resid


out = smf.ols('PD_adjusted ~ FLAIR_adjusted', data = dat).fit()
print(out.summary2())
                Results: Ordinary least squares
================================================================
Model:              OLS              Adj. R-squared:     -0.010 
Dependent Variable: PD_adjusted      AIC:                83.4183
Date:               2024-01-29 06:48 BIC:                88.6286
No. Observations:   100              Log-Likelihood:     -39.709
Df Model:           1                F-statistic:        0.04730
Df Residuals:       98               Prob (F-statistic): 0.828  
R-squared:          0.000            Scale:              0.13219
----------------------------------------------------------------
                   Coef.  Std.Err.    t    P>|t|   [0.025 0.975]
----------------------------------------------------------------
Intercept         -0.0000   0.0364 -0.0000 1.0000 -0.0722 0.0722
FLAIR_adjusted    -0.0160   0.0737 -0.2175 0.8283 -0.1623 0.1303
----------------------------------------------------------------
Omnibus:              2.142        Durbin-Watson:          2.187
Prob(Omnibus):        0.343        Jarque-Bera (JB):       1.725
Skew:                 -0.075       Prob(JB):               0.422
Kurtosis:             3.626        Condition No.:          2    
================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the
errors is correctly specified.

Notice that the coefficient is exactly the same (-0.0160). This highlights how linear regression “adjusts” for the other variables. It removes the linear association with them from both the explantory and outcome variables.