import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import sklearn as skl
Causality#
Associational models versus causal models#
Causal models differ from associational models in that they codify causal directions not just associations. In our program, you might have learned of the use of propensity scores, counterfactuals or randomization to study causality. There, typically the goal is to make causal statements with as few assumptions as possible or at least understanding the assumptions. Typically, the object of study is the estimation of an effect avergated over covarites.
Causal graphs take a different approach, even if they wind up at the same place. Here, the goal is to postulate hyptothetical causal relationships and use those hypothetical relationships to estimate causal effects.
Graphs and graphical models#
A graph,
Node
DAG and SCMs#
DAGs define a unique factorization (set of independence relationships) with compatible probability models. I find it useful to think of causal DAGs in the terms of structural causal models (SCMs). Such models demonstrate an example of a generative models that statisfy the DAG and the have clear connections with the probabability models. An SCM over a collection of variables,
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("X1", pos = (0.5, 1) )
G.add_node("X2", pos = (0 , 0) )
G.add_node("X3", pos = (1 , 0) )
G.add_edges_from([
["X1", "X2"],
["X1", "X3"],
["X2", "X3"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.show()

In this case,
By itself, this does not create any causal claims. However, the following strategy could. Postulate a causal model, like the SCM, consider the independence relationships implied by the SCM, compares those indepnence relationships with those seen in the observed data. This gives us a method to falsify causal models using the data.
One specific way in which we use the assumptions is to investigate how the graph changes when we fix a node at a specific value, like an intevention, thus breaking its association with its parents. This operation is conceptual, but at times we can relate probabilities associated with interventions that were not realized. Consider an instrance where where
Blocking and d-separation#
Before we talk about interventions, let’s consider discussing compatibility of the hypothetical directed graph and our observed data. Return to our previous diagram.
is a confounder betweend and is a mediator between and is a collider between and
Consider an example.
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("BMI35", pos = (0.5, 1) )
G.add_node("SDB", pos = (0 , 0) )
G.add_node("HTN", pos = (1 , 0) )
G.add_edges_from([
["BMI35", "SDB"],
["BMI35", "HTN"],
["SDB", "HTN"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 4000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.show()

Here if we’re studying whether SDB causes HTN, BMI35 confounds the relationship as a possible common cause of both. We would need to adjust for BMI35 to make sure the association between SDB and HTN isn’t just due to this common cause.
If we were studying whether BMI35 causes HTN, we might be interested in how much of that effect is mediated (indirectly) through SDB and how much is directly from BMI35.
If we are studying the relationship between BMI35 and SDB directly, adjusting for HTN may cause an association. Consider the (fictitious) case where there is a large number of people who have SDB who are not obese, yet all have hypertension, for whatever the reason. Then, among the HTN, there could be a negative association between BMI35 and SDB, because of the large collection of patients would who have SDB and are not obese and the same for obese and not hyptertensive. That is, adjusting for HTN created an association. This is an example of Berkson’s paradox. This is a somewhat contrived example, but hopefully you get the point.
The wikipedia article has a funny example where they consider
The main point here is that adjusting for colliders may open up a pathway between the nodes.
A path between two nodes
A path between
and is a mediator or confounder between and in either direction or and all of the descendants of and is a collider between and .
In other words, a path is blocked if a mediator or confounder is included in
For 1. this is equivalent to saying one of
holds. For 2. recall a collider is
This could be translated into the following statistical statement. Conditioning on a mediator or confounder or not conditioning on a collider blocks a path, conditioning on a collider opens a path.
We say that two nodes or groups of nodes are d-separated by a set of nodes,
Consider the following graph.
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("X1", pos = (0, 0) )
G.add_node("X2", pos = (0, 1) )
G.add_node("X3", pos = (1, 1.5) )
G.add_node("X4", pos = (1, .5) )
G.add_node("X5", pos = (2, 1) )
G.add_node("X6", pos = (2, 0) )
G.add_edges_from([
["X2", "X1" ],
["X2", "X3"],
["X3", "X1" ],
["X3", "X4"],
["X4", "X6" ],
["X5", "X3"],
["X5", "X6" ]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 2.3])
ax.set_ylim([-.3, 2.3])
plt.show()

Another interesting one is that
Here’s the complete set of conditional independence relationships.
and are d-separated by and are d-separated by and are d-separated by , , and are d-separated by and are d-separated by (the null set, i.e. they’re marginally independent). and are d-separated by , and are d-separated by and are d-separated by .
These all imply the independence relationships, such as
Do calculus and backdoor criterion#
Recall that specifying a causal graph implies the independence relationships of a probability distribution under assumptions such as the SCM. Sometimes, we’re interested in the causal relationship between an exposure,
A set
No descendant of
is in . blocks every path and that contains an arrow pointing to .
The back door criteria is similar to d-separation. However, we only focus on arrows pointing into
The magic of the back door adjustment comes from the relationship, the adjustment formula:
where
So, in our previous example, adjusting for
It’s important to emphasize, that every aspect of the adjustment formula is theoretically estimable if
Consider the following graph.
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("X", pos = (0, 0) )
G.add_node("Y", pos = (2, 0) )
G.add_node("X2", pos = (0, 1) )
G.add_node("X3", pos = (1, 1.5) )
G.add_node("X4", pos = (1, .5) )
G.add_node("X5", pos = (2, 1) )
G.add_edges_from([
["X", "Y" ],
["X2", "X" ],
["X2", "X3"],
["X3", "X" ],
["X3", "X4"],
["X4", "Y" ],
["X5", "X3"],
["X5", "Y" ]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 2.3])
ax.set_ylim([-.3, 2.3])
plt.show()

Here are the minimal backdoor adjustment variables between
Here are some invalid backdoor sets of variables.
equal to any single node. does not block the path . or does not block the path . does not block the path .
does not block the path . does not block the path .
Example graphs#
In all the following, we’re interested in the causal effect of
Randomization#
If
In contrast, if some people ignore their randomized treatment status and elect to choose a different treatment one may have opened a backdoor path (right plot). For example, if the treatment can’t be blinded and those randomized to the control with the worst baseline symptoms elect to obtain the treatment elsewhere.
plt.figure(figsize=[6, 3])
plt.subplot(1, 2, 1)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("R", pos = (0 , 1) )
G.add_node("U,Z", pos = (1, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["R", "X"],
["X", "Y"],
["U,Z", "Y"],
["Y", "U,Z"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(1, 2, 2)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("R", pos = (0 , 1) )
G.add_node("U,Z", pos = (1, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["R", "X"],
["X", "Y"],
["U,Z", "Y"],
["U,Z", "X"],
["Y", "U,Z"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.show()

Simple confounding#
The diagram below shows classic confounding. Conditioning in
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("Z", pos = (0.5, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["Z", "X"],
["Z", "Y"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.show()

Now the estimate of the adjusted effect (under our assumptions) is
In the following two examples, the unmeasured confounder
plt.figure(figsize=[6, 3])
plt.subplot(1, 2, 1)
G = nx.DiGraph()
G.add_node("X", pos = (0, 0) )
G.add_node("Z", pos = (0, 1) )
G.add_node("U", pos = (1, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["U", "Z"],
["Z", "X"],
["U", "Y"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(1, 2, 2)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("U", pos = (0, 1) )
G.add_node("Z", pos = (1, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["U", "X"],
["U", "Z"],
["Z", "Y"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
(-0.3, 1.3)

Mediation#
In mediation, all or part of the effect of
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("Z", pos = (0.5, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["X", "Z"],
["Z", "Y"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.show()

The backdoor criteria does not apply here, since
@cinelli2021crash shows an interesting example of mediation where one would want to adjust for
plt.figure(figsize=[10, 3])
plt.subplot(1, 2, 1)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("U", pos = (0 , 1) )
G.add_node("Z", pos = (1, 1) )
G.add_node("M", pos = (1 , 0) )
G.add_node("Y", pos = (2 , 0) )
G.add_edges_from([
["U", "X"],
["U", "Z"],
["X", "M"],
["Z", "M"],
["M", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 2.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(1, 2, 2)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("Z", pos = (1, 1) )
G.add_node("M", pos = (1 , 0) )
G.add_node("Y", pos = (2 , 0) )
G.add_edges_from([
["Z", "X"],
["X", "M"],
["Z", "M"],
["M", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 2.3])
ax.set_ylim([-.3, 1.3])
#plt.show()
(-0.3, 1.3)

In this case,
Bad controls#
The following are all unhelpful for conditioning on
Upper left. Adjusting for colliders is the standard bad control. Below adjusting for
In the upper right diagram below,
In the lower left plot,
The lower right plot is similar. Conditioning on
plt.figure(figsize=[6, 6])
#| echo: false
plt.subplot(2, 2, 1)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("Z", pos = (0.5, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["X", "Z"],
["Y", "Z"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(2, 2, 2)
G = nx.DiGraph()
G.add_node("X", pos = (0, 0) )
G.add_node("Z", pos = (0, 1) )
G.add_node("U", pos = (1, 1) )
G.add_node("Y", pos = (1, 0) )
G.add_edges_from([
["X", "Y"],
["U", "X"],
["U", "Y"],
["Z", "X"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(2, 2, 3)
G = nx.DiGraph()
G.add_node("Z", pos = (0, 0) )
G.add_node("X", pos = (0, 1) )
G.add_node("Y", pos = (1, 0) )
G.add_edges_from([
["X", "Z"],
["Z", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(2, 2, 4)
G = nx.DiGraph()
G.add_node("X", pos = (0, 1) )
G.add_node("M", pos = (0, 0) )
G.add_node("Z", pos = (1, 1) )
G.add_node("Y", pos = (1, 0) )
G.add_edges_from([
["X", "M"],
["M", "Z"],
["M", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.show();

Conditioning may help#
In the upper left plot, adjusting for
In the upper left plot, adjusting for
plt.figure(figsize=[6, 3])
plt.subplot(1, 2, 1)
G = nx.DiGraph()
G.add_node("X", pos = (0, 0) )
G.add_node("Z", pos = (1, 1) )
G.add_node("Y", pos = (1, 0) )
G.add_edges_from([
["X", "Y"],
["Z", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(1, 2, 2)
G = nx.DiGraph()
G.add_node("X", pos = (0, 1) )
G.add_node("M", pos = (0, 0) )
G.add_node("Z", pos = (1, 1) )
G.add_node("Y", pos = (1, 0) )
G.add_edges_from([
["X", "M"],
["Z", "M"],
["M", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
(-0.3, 1.3)

Exercises#
Graphs#
Consider the following graph where we want to answer the question: what is
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("X", pos = (0, 0) )
G.add_node("Z1", pos = (0, 1) )
G.add_node("Z2", pos = (0.5, 0.5) )
G.add_node("Z3", pos = (1, 1) )
G.add_node("Y", pos = (1, 0) )
G.add_edges_from([
["X", "Y"],
["Z2", "X"],
["Z1", "X"],
["Z1", "Z2"],
["Z3", "Y"],
["Z3", "Z2"],
["Z2", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
(-0.3, 1.3)

What are the minimal set of adjustment variables for the backdoor criteria?
Is
?Is
?Given a cross sectional sample, if
is unobserved, give a formula for the estimation of that only requires observed variables.
#| echo: false
plt.figure(figsize=[3, 3])
G = nx.DiGraph()
G.add_node("X", pos = (0, 0) )
G.add_node("Z1", pos = (0, 1) )
G.add_node("Z2", pos = (1, 1) )
G.add_node("Y", pos = (1, 0) )
G.add_edges_from([
["X", "Y"],
["X", "Z2"],
["Z1", "X"],
["Z1", "Z2"],
["Z2", "Y"]
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3]);

What are the minimal set of adjustment variables for the backdoor criteria?
Given a cross sectional sample, give a formula for the estimation of
that only requires observed variables.
Data exercise#
The wikipedia page on Simpson’s paradox gives this data concerning two treatments of kidney stones, the percentage of succcessful procedures and the size of the stone. Note, among both large stones and small stones A is better than B. However, among all B is preferable to A.
Size |
Treatment |
Success |
N |
Prop |
---|---|---|---|---|
Large |
A |
192 |
263 |
73% |
B |
55 |
80 |
69% |
|
Small |
A |
81 |
87 |
93% |
B |
234 |
270 |
87% |
|
Both |
A |
273 |
350 |
78% |
B |
289 |
350 |
83% |
Estimate the treatment effect difference: $$ P(Success ~|~ Do(Treatment) = B)
P(Success ~|~ Do(Treatment) = A) $
X Y Z$ is stone size:
plt.figure(figsize=[6, 6])
#| echo: false
plt.subplot(2, 2, 1)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("Z", pos = (0.5, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["Z", "X"],
["Z", "Y"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(2, 2, 2)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("Z", pos = (0.5, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["X", "Z"],
["Z", "Y"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.subplot(2, 2, 3)
G = nx.DiGraph()
G.add_node("X", pos = (0 , 0) )
G.add_node("Z", pos = (0.5, 1) )
G.add_node("Y", pos = (1 , 0) )
G.add_edges_from([
["X", "Y"],
["X", "Z"],
["Y", "Z"],
] )
nx.draw(G,
nx.get_node_attributes(G, 'pos'),
with_labels=True,
font_weight='bold',
node_size = 2000,
node_color = "lightblue",
linewidths = 3)
ax= plt.gca()
ax.collections[0].set_edgecolor("#000000")
ax.set_xlim([-.3, 1.3])
ax.set_ylim([-.3, 1.3])
plt.show();

Comment on how reasonable each of these models are given the setting. Here’s a reference: @julious1994confounding.
Give any other DAGs, perhaps including unmeasured variables, that you think are relevant.