Convolutions#
1D transforms#
Convolutions are an important topic in mathematics, statistics, signal processing … Let’s discuss 1D convolutions first. A real valued convolution of two continuous signals,
where the equality is determined by a simple change of variable argument. The discrete analog is
The convolution has many, many uses in data science and statistics. For example, the convolution of densities or mass functions is the respective density or mass function for the sum of random variables from those distributions. In applied data analysis, you can think of the convolution between
That is, our convolution is a moving average of mode = "valid"
).
Note we could make the kernel weight points differently than just a box kernel. A popular choice is a Gaussian distribution.
Also, the convolution has mode = "same"
). Or, you can just do it yourself if you do mode = "full"
, just shift by mode = "valid"
(but the convolution has fewer points in this case, so it won’t have corresponding points with
Here’s an example using Italy’s daily covid case count data. We plot the data and the convolution smoothed data. In the bottom panels, we show the residuals to highlight the difference.
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
dat = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
## Get Italy, drop everyrthing except dates, convert to long (unstack converts to tuple)
X = dat[dat['Country/Region'] == 'Italy'].drop(["Province/State", "Country/Region", "Lat", "Long"], axis=1).unstack()
## convert from tuple to array
X = np.asarray(X)
## get case counts instead of cumulative counts
X = X[1 : X.size] - X[0 : (X.size - 1)]
## get the first non zero entry
X = X[np.min(np.where(X != 0)) : X.size]
plt.plot(X)
[<matplotlib.lines.Line2D at 0x78b8431aefe0>]

## 41 day moving average
N = len(X)
M = 41
fig, axes = plt.subplots(2, 3, figsize = [12.4, 12.4])
axes[0,0].plot(X)
axes[0,1].plot(X)
axes[0,2].plot(X)
K = np.ones(M) / M
## Plot the convolution with the argument 'same'
## this gives N (assumed greater than M) points
XC = np.convolve(X, K, 'same')
axes[0,0].plot(XC)
axes[1,0].plot(X - XC)
## Plot the convolution with the argument 'full'
## which gives N+M-1 total pionts
XC = np.convolve(X, K, 'full')
temp = np.pad(X, (M-1, 0), 'constant')
axes[0,1].plot(XC)
axes[1,1].plot(temp- XC)
## Plot the convolution with the convolution shifted back by (M-1)/2
XCshifted = XC[ (int((M - 1)/2)) : int(len(XC) - (M - 1)/2) ]
axes[0, 2].plot(XCshifted)
axes[1, 2].plot(X - XCshifted)
[<matplotlib.lines.Line2D at 0x78b83afbabf0>]

Let’s show that the first point and end point of the convolution are the averages of
temp = np.convolve(X, K, 'same')
[
# the first convolution point (temp[0]) and the average of the
# the first (M-1) / 2 X points and (M-1)/2 + 1 zeros
[temp[0], X[0 : int( (M - 1) / 2)].sum() / M],
# the last convolution point (temp[N-1]) and the average of the
# the last (M-1) / 2 X points and (M-1)/2 + 1 zeros
[temp[N - 1], X[int(N - (M - 1) / 2 - 1) : N].sum() / M]
]
[[0.07317073170731708, 0.07317073170731707],
[2059.5853658536585, 2059.5853658536585]]
Also, I averaged a lot (41 days) in order to make the shift very apparent. Let’s look at the performance for less wide of a kernel.
## 21 day moving average
M = 21
K = np.ones(M) / M
fig, axes = plt.subplots(1, 2, figsize = [12.4, 6.2])
XC = np.convolve(X, K, 'same')
axes[0].plot(X)
axes[0].plot(XC)
axes[1].plot(X - XC)
[<matplotlib.lines.Line2D at 0x78b83a97d6c0>]

It should be stated that the convolution operation is multiplication in Fourier space. So, functions like np.convolve
are performing FFTs in the background. However, if you’re going to do this yourself, make sure to keep track of indices and zero padding. (I.e. the bookkeeping.) Otherwise, the FFT wraps around and you get a little of the end averaged in with the beginning and vice versa. I work out getting the same answer as mode = "same"
below.
fig, axes = plt.subplots(1, 2, figsize = [12.4, 6.2])
## Pad the X with zeros in the back, need at least M-1
pad_width = (0, M - 1)
Xpadded = np.pad(X, pad_width, "constant")
## Pad the kernel in the back with N-1, so both the kernel
## and the X are of length, N+M-1
Kpadded = np.pad(K, (0, N - 1))
## Note we take the real part b/c the complex part is all effectively
## machine 0
convolution = np.fft.ifft(np.fft.fft(Xpadded) * np.fft.fft(Kpadded)).real
## At this point the convolution is of length N + M - 1
## To get it comparable with the original X, subtract (M-1)/2 indices
## from each end
convolution = convolution[ int((M-1)/2) : int(N+(M-1)/2)]
## Let's see how we did
axes[0].plot(X)
axes[0].plot(convolution)
#Show they're the same by plotting the subtraction
axes[1].plot(convolution - XC)
[<matplotlib.lines.Line2D at 0x78b83af32ce0>]

2D transforms#
For two dimensions, the convolution is similar
Once again, let’s think where
(i.e.
That is, the convolution at point
For regular kernels (box kernels, 2D Gaussians), convolution smooths the image, which has the efffect of making it blurrier. The kernel width determines how blurry the image will then be. This is typically done to denoise an image (to blur out the noise). Let’s try it on a cartoon image of Brian. We’ll just stick to a black and white image so that it’s 2D. A color image has 3 color channels, so is a 3D array. (However, you see the patten; you should be able to extend this to 3D with little problem.)
import PIL
import scipy.signal as sp
import urllib.request
imgURL = "https://github.com/smart-stats/ds4bio_book/raw/main/book/bcCartoon.png"
urllib.request.urlretrieve(imgURL, "bcCartoon.png")
img = np.asarray(PIL.Image.open("bcCartoon.png").convert("L"))
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap='gray', vmin=0, vmax=255)
<matplotlib.image.AxesImage at 0x78b83ae49840>

def kernel(i, j):
return np.ones((i, j)) / np.prod([i, j])
plt.figure(figsize=[12.4, 12.4])
imgC = sp.convolve2d(img, kernel(4, 4))
plt.subplot(2, 2, 1)
plt.xticks([])
plt.yticks([])
plt.imshow(imgC, cmap='gray', vmin=0, vmax=255)
plt.title("4x4")
imgC = sp.convolve2d(img, kernel(8, 8))
plt.subplot(2, 2, 2)
plt.xticks([])
plt.yticks([])
plt.imshow(imgC, cmap='gray', vmin=0, vmax=255)
plt.title("8x8")
imgC = sp.convolve2d(img, kernel(16, 16))
plt.subplot(2, 2, 3)
plt.xticks([])
plt.yticks([])
plt.imshow(imgC, cmap='gray', vmin=0, vmax=255)
plt.title("16x16")
boxsize = (5, 5)
imgC = sp.convolve2d(img, kernel(32,32))
plt.subplot(2, 2, 4)
plt.xticks([])
plt.yticks([])
plt.imshow(imgC, cmap='gray', vmin=0, vmax=255)
plt.title("32x32")
Text(0.5, 1.0, '32x32')

Convolutional neural networks#
Of course, your kernel doesn’t have to be a box, or a truncated, discretized bivariate Gaussian density or even be non-negative. It’s helpful for smoothers to have non-negative kernels, since they’re just taking a generalized variation of a moving average that way. But, we want to use convolutions
more generally. Here, let’s take a kernel that is part of the image (left eye) and convolve it. I’ll make the kernel super peaked at eye features by extracting the eye and raising it to the 4th power.
So a relu activation function plus a bias term would then be able to highlight different thresheld variations of this convolution image. For example, here I add a bias term to the convolution then apply a leaky relu. You can see it just highlights the one area where the eye is. A leaky relu is
where
plt.figure(figsize=[12.4, 6.2])
K = img[200 : 270,225 : 322]
plt.subplot(1, 3, 1)
plt.xticks([])
plt.yticks([])
plt.imshow(K, cmap='gray', vmin=0, vmax=255)
## I normalized it this way so that the convolution
## numbers wouldn't be so big
## Also, I put it to the 4th power, so it exactly finds
## the eye.
K = K ** 4
K = K / K.sum()
K = K - K.mean()
imgC = sp.convolve2d(img, K)
plt.subplot(1, 3, 2)
plt.xticks([])
plt.yticks([])
plt.imshow(imgC)
plt.title("Convolution")
temp = imgC.copy()
## Add a bias term of -15
temp -= 15
## Perform a leaky relu
temp[np.where(temp < 0)] = temp[np.where(temp < 0)] * .05
plt.subplot(1, 3, 3)
plt.imshow(temp)
plt.xticks([])
plt.yticks([])
plt.title("LRELU of convolution + bias")
Text(0.5, 1.0, 'LRELU of convolution + bias')

Because of how convolutions work, this will find this eye anywhere in the image. Here we just add another eye somewhere else and repeat the convolution.
plt.figure(figsize=[12.4, 6.2])
#put another eye in the image
imgCopy = img.copy()
imgCopy[60 : 130, 85 : 182] = img[200 : 270,225 : 322]
plt.subplot(1, 2, 1)
plt.imshow(imgCopy, cmap='gray', vmin=0, vmax=255)
plt.xticks([])
plt.yticks([])
imgC = sp.convolve2d(imgCopy, K)
plt.subplot(1, 2, 2)
temp = imgC.copy()
## Add a bias term of -15
temp -= 15
## Perform a leaky relu
temp[np.where(temp < 0)] = temp[np.where(temp < 0)] * .05
plt.subplot(1, 2, 2)
plt.imshow(temp)
plt.xticks([])
plt.yticks([])
plt.title("LRELU of convolution + bias")
Text(0.5, 1.0, 'LRELU of convolution + bias')

So, we found a custom kernel that highlights this specific feature in images. Convnets layers learn the kernel. That is, CNNs learn the image that gets convolved with the previous layer to produce the next one. Here’s a really great pictorial guide by Sumit Saha.
Now, let’s discuss some specific vocabulary used in CNNs.
Padding zero padding just like we discussed for 1D transformations
Pooling pooling, often max pooling, is a dimension reduction technique, taking the max in little blocks.
stride length instead of sliding the kernel by moving it one pixel at a time, move it more to increase computational efficiency and reduce the size of the output convolution.