Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Pytorch

Open In Colab

Pytorch

Pytorch is a widely used library for fitting artificial intelligence models. At its core, pytorch is a tensor library. Tensors are simply multidimensional arrays. Also, since numpy is another core library for manipulating tensors, pytorch has several methods dedicated for conversion.

First let’s create a basic tensor.

import torch
import numpy as np

# 1. Creating Tensors from data
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
x_data
tensor([[1, 2], [3, 4]])

Not unlike numpy, pytorch has utilities for filling empty arrays with constant or random values. Here rand refers to random uniforms.

shape = (2, 3)
rand_tensor = torch.rand(shape)  # Random numbers from uniform distribution [0, 1)
ones_tensor = torch.ones(shape)  # All ones

print(ones_tensor)
print(rand_tensor)
tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[0.7178, 0.4327, 0.2454],
        [0.3034, 0.6983, 0.2065]])

As mentioned, it is easy to switch between numpy and pytorch. Notice that the torch tensor is not a copy of the numpy array.

np_array = np.array([10, 20, 30]) ## Create a numpy array
t_from_np = torch.from_numpy(np_array) ## Create a tensor from the numpy array
print(np_array)
print(t_from_np)
np_array[0] = 100 ## Modify the numpy array
print(np_array)
print(t_from_np)
[10 20 30]
tensor([10, 20, 30])
[100  20  30]
tensor([100,  20,  30])

pytorch has array arithmetic operations, such as matrix operations. Here are some examples.

tensor_a = torch.tensor([[1, 2], [3, 4]])
tensor_b = torch.tensor([[5, 6], [7, 8]])

print("Dimensions", [tensor_a.shape, tensor_b.shape])

# Element-wise multiplication
print("Element-wise multiplication:\n", tensor_a * tensor_b)

# Matrix Multiplication, similar to numpy 
# Can use .matmul() or the @ operator
print("Matrix Multiplication:\n", tensor_a @ tensor_b)

# In-place operations (denoted by an underscore)
# This modifies tensor_a directly rather than creating a new copy
tensor_a.add_(5)
print("Tensor A after in-place addition:\n", tensor_a)
Dimensions [torch.Size([2, 2]), torch.Size([2, 2])]
Element-wise multiplication:
 tensor([[ 5, 12],
        [21, 32]])
Matrix Multiplication:
 tensor([[19, 22],
        [43, 50]])
Tensor A after in-place addition:
 tensor([[6, 7],
        [8, 9]])

“Broadcasting” allows PyTorch to perform operations on tensors of different shapes. It automatically “stretches” the smaller tensor to match the larger one without copying data. This is similar to R, which automatically does this in its base matrix algebra libraries. It’s easy to get mixed up with these sorts of calculations, and to mess up transposes. So test your code thoroughly.

matrix = torch.tensor([[1, 2, 3], [4, 5, 6]]) 
vector = torch.tensor([10, 20, 30])
print("Dimensions", [matrix.shape, vector.shape])

## Notice how the vector is duplicated 
result = matrix + vector

print("Broadcasting result:\n", result)
Dimensions [torch.Size([2, 3]), torch.Size([3])]
Broadcasting result:
 tensor([[11, 22, 33],
        [14, 25, 36]])

Perhaps pytorch’s best attribute is its ability to calculate derivatives. Let’s go through a simple example where we can calculate the derivative ourselves. Here a and b are the variables of integration with the specific values given (2 and 6). The require's_grad=True option is telling pytorch to keep track of the gradient. The backward method is calculating the gradient.

# 'requires_grad=True' tells PyTorch to track every operation on these tensors
a = torch.tensor([2.0], requires_grad=True)
b = torch.tensor([6.0], requires_grad=True)

# Define the function
Q = 3*a**3 - b**2

print(f"Q value: {Q.item()}") # (3 * 8) - 36 = 24 - 36 = -12

# Compute gradients (Backpropagation)
# This traverses the graph backwards to calculate derivatives
Q.backward()

# Check the results against analytic calculus
# Expected dQ/da = 9 * (2^2) = 36
print(f"Computed dQ/da: {a.grad.item()}")

# Expected dQ/db = -2 * 6 = -12
print(f"Computed dQ/db: {b.grad.item()}")
Q value: -12.0
Computed dQ/da: 36.0
Computed dQ/db: -12.0

The matrices can be far more complex. That’s where we’ll use them when building our AI models.