Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Base R

Open In Colab

Base R

R is another data science language more specifically geared towards statistics. R is an implementation of the S language by John Chambers written originally by Ross Ihaka and Robert Gentleman. The R Core Team, R Foundation and almost countless contributors continually improve and maintain R.

In this course, we’re going to cover very little basic R and focus mainly on the so-called tidyverse. However, a little base R goes a long way. You can install R into jupyter notebooks and then have it as an option. Alternatively, Rstudio is a well known R ide as is ESS in Emacs.

R installation

You can install R from conda environment or directly from CRAN. If on Windows, make sure to install Rtools as well. On a mac, I always made sure that I had the developer tools installed. On linux it’s a good idea to have all of the dev version of R system dependencies installed.

# Comments in R are the # sign, just like python
# Arithmetic works like you would expect
1 + 2 * 3
Loading...
# Assign a variable, note y is a separate entity than x (run this same example in python)
x = 5
y = x
x = 10
y
Loading...

List out our variables that we’ve created.

ls()
Loading...

Honestly, the c function in R is perhaps one of the most important. It concatenates things.

z = c(1, 5, 8)
# Most operations are elementwise
z + 5 * z
Loading...

But you have to be careful that R will guess what you want to do. In this case adding a vector of length 3 + a vector of length 6 just repeats the vector of length 3 twice.

z + c(z, z)
Loading...

R’s Boolean values are TRUE and FALSE.

3 == 4
3 == 3
Loading...
Loading...

Control flow

R has for loops, while loops and control flow. The syntax a : b creates a vector starting at a and ending at b. Note R uses functional notation, so indentation doesn’t mean anything and you have to put curly braces for things included in the for, if or while statements.

for (i in 1 : 6){
    if (i <= 3) {
        print("i is small")
    }
    else {
        print("i is large")
    }
}

[1] "i is small"
[1] "i is small"
[1] "i is small"
[1] "i is large"
[1] "i is large"
[1] "i is large"

Data structures

R’s generic structure is a list, which can be made with the command list, its generic matrix structure is a matrix, which can be made with the command matrix and its generic data frame structure is a data frame, which can be made with the command data.frame.

x = list(a = 1 : 3, b = "character", c = list(a = 1 : 4, b = "character2"))

Now, x is a list containing three elements, a is a vector, b is a string and c is itself another list. You can reference elements of x with the $ or brackets

x$a
x$b
x[[1]]
x[[2]]
Loading...
Loading...
Loading...
Loading...

Brief technicality, x[1] returns a list containing the first element of x whereas x[[1]] returns the entity itself. Let’s create a dataframe. Also note R starts counting at 1 (unlike 0 for python).

x = data.frame(index = 3 : 7, letter = letters[3 : 7])
x
Loading...

The $ operator works on dataframes. In addition, bracket notation works as well.

x[,1]
x[1 : 2,]
x[1,2]
Loading...
Loading...
Loading...

Finally, let’s cover matrices.

x = matrix( 1 : 6, 3, 2)
x
y = matrix( 1 : 6, 2, 3)
y
Loading...
Loading...
x[1,]
x[,1]
x[1, 2]
Loading...
Loading...
Loading...

Functions

R has functions and uses so-called lexical scoping. Arguments can be named or not in function calls. But, just like in python, don’t get too cute with this.

pow = function(x, n) {
    x ^ n
}
pow(2, 3)
pow(x = 2, n = 3)
pow(n = 3, x = 2)
pow(n = 3, 2)
pow(3, 2)
Loading...
Loading...
Loading...
Loading...
Loading...

Functions can be arguments to functions. The ... argument is for variable arguments.

doublefunc = function(f, x, ...){
    f(x, ...) * 2
}
doublefunc(pow, 2, 3)
doublefunc(exp, 2)
Loading...
Loading...

Variables within the scope of the environment that they are defined in. Note below, the variable c1 isn’t found since it’s only defined within f’s environment. Similarly, e is not found within f since it’s only found within the function g defined within f. Finally, note since c is a predefined function I had to define c as c1. R will let you define c no problem, but it creates confusion. Double check whether a variable is already assigned before defining it to avoid confusion.

a = 2
f = function(b){
    c1 = 3
    g = function(d){
        e = 4
        return(1)
    }
    #DOESN'T WORK
    #print(e)
    return(1)
}
#DOESN'T WORK
#print(c1)
f(1)
Loading...