Base R#
R is another data science language more specifically geared towards statistics. R is an implementation of the S language by John Chambers written originally by Ross Ihaka and Robert Gentleman. The R Core Team, R Foundation and almost countless contributors continually improve and maintain R.
In this course, we’re going to cover very little basic R and focus mainly on the so-called tidyverse. However, a little base R goes a long way. You can install R into jupyter notebooks and then have it as an option. Alternatively, Rstudio is a well known R ide as is ESS in Emacs.
R installation#
You can install R from conda environment or directly from CRAN. If on Windows, make sure to install Rtools as well. On a mac, I always made sure that I had the developer tools installed. On linux it’s a good idea to have all of the dev version of R system dependencies installed.
# Comments in R are the # sign, just like python
# Arithmetic works like you would expect
1 + 2 * 3
# Assign a variable, note y is a separate entity than x (run this same example in python)
x = 5
y = x
x = 10
y
List out our variables that we’ve created.
ls()
- 'x'
- 'y'
Honestly, the c
function in R is perhaps one of the most important. It concatenates things.
z = c(1, 5, 8)
# Most operations are elementwise
z + 5 * z
- 6
- 30
- 48
But you have to be careful that R will guess what you want to do. In this case adding a vector of length 3 + a vector of length 6 just repeats the vector of length 3 twice.
z + c(z, z)
- 2
- 10
- 16
- 2
- 10
- 16
R’s Boolean values are TRUE
and FALSE
.
3 == 4
3 == 3
Control flow#
R has for loops, while loops and control flow. The syntax a : b
creates a vector starting at a
and ending at b
. Note R uses functional notation, so indentation doesn’t mean anything and you have to put curly braces for things included in the for
, if
or while
statements.
for (i in 1 : 6){
if (i <= 3) {
print("i is small")
}
else {
print("i is large")
}
}
[1] "i is small"
[1] "i is small"
[1] "i is small"
[1] "i is large"
[1] "i is large"
[1] "i is large"
Data structures#
R’s generic structure is a list, which can be made with the command list
, its generic matrix structure is a matrix, which can be made with the command matrix
and its generic data frame structure is a data frame, which can be made with the command data.frame
.
x = list(a = 1 : 3, b = "character", c = list(a = 1 : 4, b = "character2"))
Now, x
is a list containing three elements, a
is a vector, b
is a string and c
is itself another list. You can reference elements of x
with the $
or brackets
x$a
x$b
x[[1]]
x[[2]]
- 1
- 2
- 3
- 1
- 2
- 3
Brief technicality, x[1]
returns a list containing the first element of x
whereas x[[1]]
returns the entity itself. Let’s create a dataframe. Also note R starts counting at 1 (unlike 0 for python).
x = data.frame(index = 3 : 7, letter = letters[3 : 7])
x
index | letter |
---|---|
<int> | <chr> |
3 | c |
4 | d |
5 | e |
6 | f |
7 | g |
The $
operator works on dataframes. In addition, bracket notation works as well.
x[,1]
x[1 : 2,]
x[1,2]
- 3
- 4
- 5
- 6
- 7
index | letter | |
---|---|---|
<int> | <chr> | |
1 | 3 | c |
2 | 4 | d |
Finally, let’s cover matrices.
x = matrix( 1 : 6, 3, 2)
x
y = matrix( 1 : 6, 2, 3)
y
1 | 4 |
2 | 5 |
3 | 6 |
1 | 3 | 5 |
2 | 4 | 6 |
x[1,]
x[,1]
x[1, 2]
- 1
- 4
- 1
- 2
- 3
Functions#
R has functions and uses so-called lexical scoping. Arguments can be named or not in function calls. But, just like in python, don’t get too cute with this.
pow = function(x, n) {
x ^ n
}
pow(2, 3)
pow(x = 2, n = 3)
pow(n = 3, x = 2)
pow(n = 3, 2)
pow(3, 2)
Functions can be arguments to functions. The ...
argument is for variable arguments.
doublefunc = function(f, x, ...){
f(x, ...) * 2
}
doublefunc(pow, 2, 3)
doublefunc(exp, 2)
Variables within the scope of the environment that they are defined in. Note below, the variable c1
isn’t found since it’s only defined within f
’s environment. Similarly, e
is not found within f
since it’s only found within the function g
defined within f
. Finally, note since c
is a predefined function I had to define c
as c1
. R will let you define c
no problem, but it creates confusion. Double check whether a variable is already assigned before defining it to avoid confusion.
a = 2
f = function(b){
c1 = 3
g = function(d){
e = 4
return(1)
}
#DOESN'T WORK
#print(e)
return(1)
}
#DOESN'T WORK
#print(c1)
f(1)