{"cells":[{"cell_type":"markdown","id":"b14bd027-acc4-4785-bda1-7e23100341d1","metadata":{"id":"b14bd027-acc4-4785-bda1-7e23100341d1"},"source":["\"Open\n","\n","# Base R\n","\n","R is another data science language more specifically geared towards statistics. R is an implementation of the S language by John Chambers written originally by Ross Ihaka and Robert Gentleman. The R Core Team, R Foundation and almost countless contributors continually improve and maintain R.\n","\n","In this course, we're going to cover very little basic R and focus mainly on the so-called tidyverse. However, a little base R goes a long way. You can install R into jupyter notebooks and then have it as an option. Alternatively, Rstudio is a well known R ide as is ESS in Emacs.\n","\n","## R installation\n","\n","You can install R from conda environment or directly from [CRAN](https://cran.r-project.org/). If on Windows, make sure to install Rtools as well. On a mac, I always made sure that I had the developer tools installed. On linux it's a good idea to have all of the dev version of R system dependencies installed."]},{"cell_type":"code","execution_count":null,"id":"beb8fcb3-3bcf-4f7e-b310-96d166e2b7aa","metadata":{"id":"beb8fcb3-3bcf-4f7e-b310-96d166e2b7aa","outputId":"8fbae2bf-265a-47ed-e6d6-70af1eccafa0"},"outputs":[{"data":{"text/html":["7"],"text/latex":["7"],"text/markdown":["7"],"text/plain":["[1] 7"]},"metadata":{},"output_type":"display_data"}],"source":["# Comments in R are the # sign, just like python\n","# Arithmetic works like you would expect\n","1 + 2 * 3"]},{"cell_type":"code","execution_count":null,"id":"c4ad64e2-6bdc-491d-b520-b4b5701a4335","metadata":{"id":"c4ad64e2-6bdc-491d-b520-b4b5701a4335","outputId":"71b1abe8-c05a-498c-c993-eb93f818e60d"},"outputs":[{"data":{"text/html":["5"],"text/latex":["5"],"text/markdown":["5"],"text/plain":["[1] 5"]},"metadata":{},"output_type":"display_data"}],"source":["# Assign a variable, note y is a separate entity than x (run this same example in python)\n","x = 5\n","y = x\n","x = 10\n","y"]},{"cell_type":"markdown","id":"529f98cf","metadata":{"id":"529f98cf"},"source":["List out our variables that we've created."]},{"cell_type":"code","execution_count":null,"id":"c6b46336","metadata":{"id":"c6b46336","outputId":"c13c9129-0478-4534-c34b-f0564054bd21"},"outputs":[{"data":{"text/html":["\n","
  1. 'x'
  2. 'y'
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 'x'\n","\\item 'y'\n","\\end{enumerate*}\n"],"text/markdown":["1. 'x'\n","2. 'y'\n","\n","\n"],"text/plain":["[1] \"x\" \"y\""]},"metadata":{},"output_type":"display_data"}],"source":["ls()"]},{"cell_type":"markdown","id":"1b431f69-ed00-4857-943f-8faa453baaf8","metadata":{"id":"1b431f69-ed00-4857-943f-8faa453baaf8"},"source":["Honestly, the `c` function in R is perhaps one of the most important. It concatenates things."]},{"cell_type":"code","execution_count":null,"id":"e720c370-2a2e-4c2f-95b4-e14b9aa96c3e","metadata":{"id":"e720c370-2a2e-4c2f-95b4-e14b9aa96c3e","outputId":"7ec7d91e-e68f-4c4e-d022-6afceb962545"},"outputs":[{"data":{"text/html":["\n","
  1. 6
  2. 30
  3. 48
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 6\n","\\item 30\n","\\item 48\n","\\end{enumerate*}\n"],"text/markdown":["1. 6\n","2. 30\n","3. 48\n","\n","\n"],"text/plain":["[1] 6 30 48"]},"metadata":{},"output_type":"display_data"}],"source":["z = c(1, 5, 8)\n","# Most operations are elementwise\n","z + 5 * z"]},{"cell_type":"markdown","id":"742b8b3a-6dca-4f83-a751-201e8c96761e","metadata":{"id":"742b8b3a-6dca-4f83-a751-201e8c96761e"},"source":["But you have to be careful that R will guess what you want to do.\n","In this case adding a vector of length 3 + a vector of length 6 just repeats the vector of length 3 twice."]},{"cell_type":"code","execution_count":null,"id":"8454ee3a-0e07-481a-a683-a0cfe78d016b","metadata":{"id":"8454ee3a-0e07-481a-a683-a0cfe78d016b","outputId":"e2e5665e-ba73-4617-df14-83844aa76e85"},"outputs":[{"data":{"text/html":["\n","
  1. 2
  2. 10
  3. 16
  4. 2
  5. 10
  6. 16
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 2\n","\\item 10\n","\\item 16\n","\\item 2\n","\\item 10\n","\\item 16\n","\\end{enumerate*}\n"],"text/markdown":["1. 2\n","2. 10\n","3. 16\n","4. 2\n","5. 10\n","6. 16\n","\n","\n"],"text/plain":["[1] 2 10 16 2 10 16"]},"metadata":{},"output_type":"display_data"}],"source":["z + c(z, z)"]},{"cell_type":"markdown","id":"9a599322-1ad2-4905-b369-1f9c6ca5d0c0","metadata":{"id":"9a599322-1ad2-4905-b369-1f9c6ca5d0c0"},"source":["R's Boolean values are `TRUE` and `FALSE`."]},{"cell_type":"code","execution_count":null,"id":"a3b3867c-dceb-4008-a66e-ef60f6e9285c","metadata":{"id":"a3b3867c-dceb-4008-a66e-ef60f6e9285c","outputId":"0bfffd2f-9a13-493d-f91a-41876ca8a457"},"outputs":[{"data":{"text/html":["FALSE"],"text/latex":["FALSE"],"text/markdown":["FALSE"],"text/plain":["[1] FALSE"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["TRUE"],"text/latex":["TRUE"],"text/markdown":["TRUE"],"text/plain":["[1] TRUE"]},"metadata":{},"output_type":"display_data"}],"source":["3 == 4\n","3 == 3"]},{"cell_type":"markdown","id":"21225bb3-1c92-4dbc-8ebd-bb9a50db0736","metadata":{"id":"21225bb3-1c92-4dbc-8ebd-bb9a50db0736"},"source":["## Control flow\n","R has for loops, while loops and control flow. The syntax `a : b` creates a vector starting at `a` and ending at `b`. Note R uses functional notation, so indentation doesn't mean anything and you have to put curly braces for things included in the `for`, `if` or `while` statements."]},{"cell_type":"code","execution_count":null,"id":"dcd0116d-beeb-4056-8182-13a6ce153b9f","metadata":{"id":"dcd0116d-beeb-4056-8182-13a6ce153b9f","outputId":"0182491b-92a9-4572-838c-7424c60953d3"},"outputs":[{"name":"stdout","output_type":"stream","text":["[1] \"i is small\"\n","[1] \"i is small\"\n","[1] \"i is small\"\n","[1] \"i is large\"\n","[1] \"i is large\"\n","[1] \"i is large\"\n"]}],"source":["for (i in 1 : 6){\n"," if (i <= 3) {\n"," print(\"i is small\")\n"," }\n"," else {\n"," print(\"i is large\")\n"," }\n","}\n","\n"]},{"cell_type":"markdown","id":"5d814eac-4187-4ede-91d4-9469241d5578","metadata":{"id":"5d814eac-4187-4ede-91d4-9469241d5578"},"source":["## Data structures\n","R's generic structure is a list, which can be made with the command `list`, its generic matrix structure is a matrix, which can be made with the command `matrix` and its generic data frame structure is a data frame, which can be made with the command `data.frame`."]},{"cell_type":"code","execution_count":null,"id":"74eb0781-a2ea-43e3-ac11-ac47c637c34a","metadata":{"id":"74eb0781-a2ea-43e3-ac11-ac47c637c34a"},"outputs":[],"source":["x = list(a = 1 : 3, b = \"character\", c = list(a = 1 : 4, b = \"character2\"))"]},{"cell_type":"markdown","id":"fe3f5f70-935f-4bd9-9439-5a1fb40d3a33","metadata":{"id":"fe3f5f70-935f-4bd9-9439-5a1fb40d3a33"},"source":["Now, `x` is a list containing three elements, `a` is a vector, `b` is a string and `c` is itself another list. You can reference elements of `x` with the `$` or brackets"]},{"cell_type":"code","execution_count":null,"id":"b9b5d804-a0d5-432f-b1f1-de7af79dcf12","metadata":{"id":"b9b5d804-a0d5-432f-b1f1-de7af79dcf12","outputId":"ca53f9c1-c335-45c4-b97b-8f156385c391"},"outputs":[{"data":{"text/html":["\n","
  1. 1
  2. 2
  3. 3
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 1\n","\\item 2\n","\\item 3\n","\\end{enumerate*}\n"],"text/markdown":["1. 1\n","2. 2\n","3. 3\n","\n","\n"],"text/plain":["[1] 1 2 3"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["'character'"],"text/latex":["'character'"],"text/markdown":["'character'"],"text/plain":["[1] \"character\""]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["\n","
  1. 1
  2. 2
  3. 3
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 1\n","\\item 2\n","\\item 3\n","\\end{enumerate*}\n"],"text/markdown":["1. 1\n","2. 2\n","3. 3\n","\n","\n"],"text/plain":["[1] 1 2 3"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["'character'"],"text/latex":["'character'"],"text/markdown":["'character'"],"text/plain":["[1] \"character\""]},"metadata":{},"output_type":"display_data"}],"source":["x$a\n","x$b\n","x[[1]]\n","x[[2]]"]},{"cell_type":"markdown","id":"91ec3d2a-b190-47bf-96c2-0654a61201e8","metadata":{"id":"91ec3d2a-b190-47bf-96c2-0654a61201e8"},"source":["Brief technicality, `x[1]` returns a list containing the first element of `x` whereas `x[[1]]` returns the entity itself. Let's create a dataframe. Also note R starts counting at 1 (unlike 0 for python)."]},{"cell_type":"code","execution_count":null,"id":"a526c766-b000-4c45-9fcc-93b693dceddf","metadata":{"id":"a526c766-b000-4c45-9fcc-93b693dceddf","outputId":"c5184615-ccec-40ec-e8f1-9153668c9ecd"},"outputs":[{"data":{"text/html":["\n","\n","\n","\t\n","\t\n","\n","\n","\t\n","\t\n","\t\n","\t\n","\t\n","\n","
A data.frame: 5 × 2
indexletter
<int><chr>
3c
4d
5e
6f
7g
\n"],"text/latex":["A data.frame: 5 × 2\n","\\begin{tabular}{ll}\n"," index & letter\\\\\n"," & \\\\\n","\\hline\n","\t 3 & c\\\\\n","\t 4 & d\\\\\n","\t 5 & e\\\\\n","\t 6 & f\\\\\n","\t 7 & g\\\\\n","\\end{tabular}\n"],"text/markdown":["\n","A data.frame: 5 × 2\n","\n","| index <int> | letter <chr> |\n","|---|---|\n","| 3 | c |\n","| 4 | d |\n","| 5 | e |\n","| 6 | f |\n","| 7 | g |\n","\n"],"text/plain":[" index letter\n","1 3 c \n","2 4 d \n","3 5 e \n","4 6 f \n","5 7 g "]},"metadata":{},"output_type":"display_data"}],"source":["x = data.frame(index = 3 : 7, letter = letters[3 : 7])\n","x"]},{"cell_type":"markdown","id":"c45c96ab-8918-4b12-b9ec-8e49fa034159","metadata":{"id":"c45c96ab-8918-4b12-b9ec-8e49fa034159"},"source":["The `$` operator works on dataframes. In addition, bracket notation works as well."]},{"cell_type":"code","execution_count":null,"id":"16fd0319-466f-4105-8d2b-d492fc6d6387","metadata":{"id":"16fd0319-466f-4105-8d2b-d492fc6d6387","outputId":"ed9b4365-18c5-4459-d78e-47d56e5dd12f"},"outputs":[{"data":{"text/html":["\n","
  1. 3
  2. 4
  3. 5
  4. 6
  5. 7
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 3\n","\\item 4\n","\\item 5\n","\\item 6\n","\\item 7\n","\\end{enumerate*}\n"],"text/markdown":["1. 3\n","2. 4\n","3. 5\n","4. 6\n","5. 7\n","\n","\n"],"text/plain":["[1] 3 4 5 6 7"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["\n","\n","\n","\t\n","\t\n","\n","\n","\t\n","\t\n","\n","
A data.frame: 2 × 2
indexletter
<int><chr>
13c
24d
\n"],"text/latex":["A data.frame: 2 × 2\n","\\begin{tabular}{r|ll}\n"," & index & letter\\\\\n"," & & \\\\\n","\\hline\n","\t1 & 3 & c\\\\\n","\t2 & 4 & d\\\\\n","\\end{tabular}\n"],"text/markdown":["\n","A data.frame: 2 × 2\n","\n","| | index <int> | letter <chr> |\n","|---|---|---|\n","| 1 | 3 | c |\n","| 2 | 4 | d |\n","\n"],"text/plain":[" index letter\n","1 3 c \n","2 4 d "]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["'c'"],"text/latex":["'c'"],"text/markdown":["'c'"],"text/plain":["[1] \"c\""]},"metadata":{},"output_type":"display_data"}],"source":["x[,1]\n","x[1 : 2,]\n","x[1,2]"]},{"cell_type":"markdown","id":"4100c894-3187-4e0e-bcf9-bbe487847a1d","metadata":{"id":"4100c894-3187-4e0e-bcf9-bbe487847a1d"},"source":["Finally, let's cover matrices."]},{"cell_type":"code","execution_count":null,"id":"bcb357e6-ef33-409e-9a14-1f1d06abb94c","metadata":{"id":"bcb357e6-ef33-409e-9a14-1f1d06abb94c","outputId":"8c0faef7-8b83-4a3e-92cf-7d1514a56646"},"outputs":[{"data":{"text/html":["\n","\n","\n","\t\n","\t\n","\t\n","\n","
A matrix: 3 × 2 of type int
14
25
36
\n"],"text/latex":["A matrix: 3 × 2 of type int\n","\\begin{tabular}{ll}\n","\t 1 & 4\\\\\n","\t 2 & 5\\\\\n","\t 3 & 6\\\\\n","\\end{tabular}\n"],"text/markdown":["\n","A matrix: 3 × 2 of type int\n","\n","| 1 | 4 |\n","| 2 | 5 |\n","| 3 | 6 |\n","\n"],"text/plain":[" [,1] [,2]\n","[1,] 1 4 \n","[2,] 2 5 \n","[3,] 3 6 "]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["\n","\n","\n","\t\n","\t\n","\n","
A matrix: 2 × 3 of type int
135
246
\n"],"text/latex":["A matrix: 2 × 3 of type int\n","\\begin{tabular}{lll}\n","\t 1 & 3 & 5\\\\\n","\t 2 & 4 & 6\\\\\n","\\end{tabular}\n"],"text/markdown":["\n","A matrix: 2 × 3 of type int\n","\n","| 1 | 3 | 5 |\n","| 2 | 4 | 6 |\n","\n"],"text/plain":[" [,1] [,2] [,3]\n","[1,] 1 3 5 \n","[2,] 2 4 6 "]},"metadata":{},"output_type":"display_data"}],"source":["x = matrix( 1 : 6, 3, 2)\n","x\n","y = matrix( 1 : 6, 2, 3)\n","y"]},{"cell_type":"code","execution_count":null,"id":"00c9c5a4-602d-4504-baa6-6dc1f3073172","metadata":{"id":"00c9c5a4-602d-4504-baa6-6dc1f3073172","outputId":"5987ec01-4434-41c6-cd5a-d689233ee1d2"},"outputs":[{"data":{"text/html":["\n","
  1. 1
  2. 4
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 1\n","\\item 4\n","\\end{enumerate*}\n"],"text/markdown":["1. 1\n","2. 4\n","\n","\n"],"text/plain":["[1] 1 4"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["\n","
  1. 1
  2. 2
  3. 3
\n"],"text/latex":["\\begin{enumerate*}\n","\\item 1\n","\\item 2\n","\\item 3\n","\\end{enumerate*}\n"],"text/markdown":["1. 1\n","2. 2\n","3. 3\n","\n","\n"],"text/plain":["[1] 1 2 3"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["4"],"text/latex":["4"],"text/markdown":["4"],"text/plain":["[1] 4"]},"metadata":{},"output_type":"display_data"}],"source":["x[1,]\n","x[,1]\n","x[1, 2]"]},{"cell_type":"markdown","id":"dedcc9af-e79a-4018-bc18-dcbc08c0a3a8","metadata":{"id":"dedcc9af-e79a-4018-bc18-dcbc08c0a3a8"},"source":["## Functions\n","\n","R has functions and uses so-called lexical scoping. Arguments can be named or not in function calls. But, just like in python, don't get too cute with this."]},{"cell_type":"code","execution_count":null,"id":"3e76e1be-76db-425a-9032-cc0c048a2367","metadata":{"id":"3e76e1be-76db-425a-9032-cc0c048a2367","outputId":"a3194347-9d94-4cac-c9e4-de2192156051"},"outputs":[{"data":{"text/html":["8"],"text/latex":["8"],"text/markdown":["8"],"text/plain":["[1] 8"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["8"],"text/latex":["8"],"text/markdown":["8"],"text/plain":["[1] 8"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["8"],"text/latex":["8"],"text/markdown":["8"],"text/plain":["[1] 8"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["8"],"text/latex":["8"],"text/markdown":["8"],"text/plain":["[1] 8"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["9"],"text/latex":["9"],"text/markdown":["9"],"text/plain":["[1] 9"]},"metadata":{},"output_type":"display_data"}],"source":["pow = function(x, n) {\n"," x ^ n\n","}\n","pow(2, 3)\n","pow(x = 2, n = 3)\n","pow(n = 3, x = 2)\n","pow(n = 3, 2)\n","pow(3, 2)"]},{"cell_type":"markdown","id":"6eea7441-7792-4737-baa5-416ff4e8bf66","metadata":{"id":"6eea7441-7792-4737-baa5-416ff4e8bf66"},"source":["Functions can be arguments to functions. The `...` argument is for variable arguments."]},{"cell_type":"code","execution_count":null,"id":"1f9e5cc2-e1c7-4659-a57d-4bf12d0f3858","metadata":{"id":"1f9e5cc2-e1c7-4659-a57d-4bf12d0f3858","outputId":"efa3fe36-3735-47ed-9198-2640fb61c513"},"outputs":[{"data":{"text/html":["16"],"text/latex":["16"],"text/markdown":["16"],"text/plain":["[1] 16"]},"metadata":{},"output_type":"display_data"},{"data":{"text/html":["14.7781121978613"],"text/latex":["14.7781121978613"],"text/markdown":["14.7781121978613"],"text/plain":["[1] 14.77811"]},"metadata":{},"output_type":"display_data"}],"source":["doublefunc = function(f, x, ...){\n"," f(x, ...) * 2\n","}\n","doublefunc(pow, 2, 3)\n","doublefunc(exp, 2)"]},{"cell_type":"markdown","id":"157fddeb-f7c8-40b6-bb9e-482d0bf36054","metadata":{"id":"157fddeb-f7c8-40b6-bb9e-482d0bf36054"},"source":["Variables within the scope of the environment that they are defined in. Note below, the variable `c1` isn't found since it's only defined within `f`'s environment. Similarly, `e` is not found within `f` since it's only found within the function `g` defined within `f`. Finally, note since `c` is a predefined function I had to define `c` as `c1`. R will let you define `c` no problem, but it creates confusion. Double check whether a variable is already assigned before defining it to avoid confusion."]},{"cell_type":"code","execution_count":null,"id":"ee4d2309-01d5-4e7c-9524-b8a90affdda0","metadata":{"id":"ee4d2309-01d5-4e7c-9524-b8a90affdda0"},"outputs":[],"source":["a = 2\n","f = function(b){\n"," c1 = 3\n"," g = function(d){\n"," e = 4\n"," return(1)\n"," }\n"," #DOESN'T WORK\n"," #print(e)\n"," return(1)\n","}\n","#DOESN'T WORK\n","#print(c1)"]},{"cell_type":"code","execution_count":null,"id":"da0cbedc-cf62-48ba-aeb5-0907b2ef3a86","metadata":{"id":"da0cbedc-cf62-48ba-aeb5-0907b2ef3a86","outputId":"a0345700-f068-463c-8374-c8b7ee217539"},"outputs":[{"data":{"text/html":["1"],"text/latex":["1"],"text/markdown":["1"],"text/plain":["[1] 1"]},"metadata":{},"output_type":"display_data"}],"source":["f(1)"]}],"metadata":{"kernelspec":{"display_name":"R","language":"R","name":"ir"},"language_info":{"codemirror_mode":"r","file_extension":".r","mimetype":"text/x-r-source","name":"R","pygments_lexer":"r","version":"4.3.1"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":5}