Base R

Base R#

R is another data science language more specifically geared towards statistics. R is an implementation of the S language by John Chambers written originally by Ross Ihaka and Robert Gentleman. The R Core Team, R Foundation and almost countless contributors continually improve and maintain R.

In this course, we’re going to cover very little basic R and focus mainly on the so-called tidyverse. However, a little base R goes a long way. You can install R into jupyter notebooks and then have it as an option. Alternatively, Rstudio is a well known R ide as is ESS in Emacs.

R installation#

You can install R from conda environment or directly from CRAN. If on Windows, make sure to install Rtools as well. On a mac, I always made sure that I had the developer tools installed. On linux it’s a good idea to have all of the dev version of R system dependencies installed.

# Comments in R are the # sign, just like python
# Arithmetic works like you would expect
1 + 2 * 3

7

# Assign a variable, note y is a separate entity than x (run this same example in python)
x = 5
y = x
x = 10
y

5

List out our variables that we’ve created.

ls()

'x'
'y'

Honestly, the c function in R is perhaps one of the most important. It concatenates things.

z = c(1, 5, 8)
# Most operations are elementwise
z + 5 * z

6
30
48

But you have to be careful that R will guess what you want to do. In this case adding a vector of length 3 + a vector of length 6 just repeats the vector of length 3 twice.

z + c(z, z)

2
10
16
2
10
16

R’s Boolean values are TRUE and FALSE.

3 == 4
3 == 3

FALSE

TRUE

Control flow#

R has for loops, while loops and control flow. The syntax a : b creates a vector starting at a and ending at b. Note R uses functional notation, so indentation doesn’t mean anything and you have to put curly braces for things included in the for, if or while statements.

for (i in 1 : 6){
    if (i <= 3) {
        print("i is small")
    }
    else {
        print("i is large")
    }
}

[1] "i is small"
[1] "i is small"
[1] "i is small"
[1] "i is large"
[1] "i is large"
[1] "i is large"

Data structures#

R’s generic structure is a list, which can be made with the command list, its generic matrix structure is a matrix, which can be made with the command matrix and its generic data frame structure is a data frame, which can be made with the command data.frame.

x = list(a = 1 : 3, b = "character", c = list(a = 1 : 4, b = "character2"))

Now, x is a list containing three elements, a is a vector, b is a string and c is itself another list. You can reference elements of x with the $ or brackets

x$a
x$b
x[[1]]
x[[2]]

1
2
3

'character'

1
2
3

'character'

Brief technicality, x[1] returns a list containing the first element of x whereas x[[1]] returns the entity itself. Let’s create a dataframe. Also note R starts counting at 1 (unlike 0 for python).

x = data.frame(index = 3 : 7, letter = letters[3 : 7])
x

A data.frame: 5 × 2
index	letter
<int>	<chr>
3	c
4	d
5	e
6	f
7	g

The $ operator works on dataframes. In addition, bracket notation works as well.

x[,1]
x[1 : 2,]
x[1,2]

3
4
5
6
7

A data.frame: 2 × 2
	index	letter
	<int>	<chr>
1	3	c
2	4	d

'c'

Finally, let’s cover matrices.

x = matrix( 1 : 6, 3, 2)
x
y = matrix( 1 : 6, 2, 3)
y

A matrix: 3 × 2 of type int
1	4
2	5
3	6

A matrix: 2 × 3 of type int
1	3	5
2	4	6

x[1,]
x[,1]
x[1, 2]

1
4

1
2
3

4

Functions#

R has functions and uses so-called lexical scoping. Arguments can be named or not in function calls. But, just like in python, don’t get too cute with this.

pow = function(x, n) {
    x ^ n
}
pow(2, 3)
pow(x = 2, n = 3)
pow(n = 3, x = 2)
pow(n = 3, 2)
pow(3, 2)

8

9

Functions can be arguments to functions. The ... argument is for variable arguments.

doublefunc = function(f, x, ...){
    f(x, ...) * 2
}
doublefunc(pow, 2, 3)
doublefunc(exp, 2)

16

14.7781121978613

Variables within the scope of the environment that they are defined in. Note below, the variable c1 isn’t found since it’s only defined within f’s environment. Similarly, e is not found within f since it’s only found within the function g defined within f. Finally, note since c is a predefined function I had to define c as c1. R will let you define c no problem, but it creates confusion. Double check whether a variable is already assigned before defining it to avoid confusion.

a = 2
f = function(b){
    c1 = 3
    g = function(d){
        e = 4
        return(1)
    }
    #DOESN'T WORK
    #print(e)
    return(1)
}
#DOESN'T WORK
#print(c1)

f(1)

1