Interpretation of linear regression coefficients.#
The module statsmodels
gives a particularly convenient R-like formula approach to fitting linear models.
It allows for a model specification of the form outcome ~ predictors
. We give an example below.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import sklearn.linear_model as lm
import sklearn as skl
import statsmodels.formula.api as smf
## this sets some style parameters
sns.set()
## Read in the data and display a few rows
dat = pd.read_csv("https://raw.githubusercontent.com/bcaffo/ds4bme_intro/master/data/oasis.csv")
results = smf.ols('PD ~ FLAIR + T1 + T2 + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit()
print(results.summary2())
Results: Ordinary least squares
=================================================================
Model: OLS Adj. R-squared: 0.743
Dependent Variable: PD AIC: 95.4183
Date: 2024-01-29 06:48 BIC: 116.2597
No. Observations: 100 Log-Likelihood: -39.709
Df Model: 7 F-statistic: 41.98
Df Residuals: 92 Prob (F-statistic): 5.56e-26
R-squared: 0.762 Scale: 0.14081
------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
------------------------------------------------------------------
Intercept 0.2349 0.1231 1.9086 0.0594 -0.0095 0.4794
FLAIR -0.0160 0.0761 -0.2107 0.8336 -0.1671 0.1351
T1 -0.2116 0.0777 -2.7251 0.0077 -0.3659 -0.0574
T2 0.6078 0.0747 8.1323 0.0000 0.4593 0.7562
FLAIR_10 -0.2581 0.3078 -0.8386 0.4039 -0.8693 0.3532
T1_10 0.2212 0.1494 1.4810 0.1420 -0.0755 0.5179
T2_10 0.1103 0.2642 0.4177 0.6771 -0.4143 0.6350
FLAIR_20 1.8072 0.6423 2.8136 0.0060 0.5315 3.0828
-----------------------------------------------------------------
Omnibus: 2.142 Durbin-Watson: 2.187
Prob(Omnibus): 0.343 Jarque-Bera (JB): 1.725
Skew: -0.075 Prob(JB): 0.422
Kurtosis: 3.626 Condition No.: 40
=================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the
errors is correctly specified.
The interpretation of the FLAIR coefficient is as follows. We estimate an expected 0.0160 decrease in proton density per 1 unit change in FLAIR - with all of the remaining model terms held constant. The latter statements is important to remember. That is, it’s improtant to remember that coefficients are adjusted for the linear associations with other variables. One way to think about this is that both the PD and FLAIR variables have had the linear association with the other variables removed before relating them. The same is true for the other variables. The coefficient for T1 is interpreted similarly, it’s the relationship between PD and T1 where the linear associations with the other variables had been removed from them both. Let’s show this for the FLAIR variable.
# Model for PD with FLAIR removed
dat['PD_adjusted'] = smf.ols('PD ~ T1 + T2 + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit().resid
# Model for FLAIR
dat['FLAIR_adjusted'] = smf.ols('FLAIR ~ T1 + T2 + FLAIR_10 + T1_10 + T2_10 + FLAIR_20', data = dat).fit().resid
out = smf.ols('PD_adjusted ~ FLAIR_adjusted', data = dat).fit()
print(out.summary2())
Results: Ordinary least squares
================================================================
Model: OLS Adj. R-squared: -0.010
Dependent Variable: PD_adjusted AIC: 83.4183
Date: 2024-01-29 06:48 BIC: 88.6286
No. Observations: 100 Log-Likelihood: -39.709
Df Model: 1 F-statistic: 0.04730
Df Residuals: 98 Prob (F-statistic): 0.828
R-squared: 0.000 Scale: 0.13219
----------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
----------------------------------------------------------------
Intercept -0.0000 0.0364 -0.0000 1.0000 -0.0722 0.0722
FLAIR_adjusted -0.0160 0.0737 -0.2175 0.8283 -0.1623 0.1303
----------------------------------------------------------------
Omnibus: 2.142 Durbin-Watson: 2.187
Prob(Omnibus): 0.343 Jarque-Bera (JB): 1.725
Skew: -0.075 Prob(JB): 0.422
Kurtosis: 3.626 Condition No.: 2
================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the
errors is correctly specified.
Notice that the coefficient is exactly the same (-0.0160). This highlights how linear regression “adjusts” for the other variables. It removes the linear association with them from both the explantory and outcome variables.