Python

Python#

Python background#

First, you’ll need a place to program python for data analysis. Python has a dizzying array of options for its use. A first choice is whether you’ll use python locally (installed on your computer) or in the cloud. The cloud options take care of a lot of installation problems, in exchange for a loss in control and typically much less computing resources unless you pay for stronger cloud computing. A second choice is whether you’ll look program in notebook environments or in a straight code editors. Notebooks mix code and documentation and are especially useful for programming for data analyses. More pure code editors and integrated development environments are preferable for writing software. Here’s a list of some of things I’ve tried and liked.

  • Jupyter and Jupyter labs These are the most popular notebook solutions for python. You can run them in the cloud or locally, either way they run in a browser as a web page. There are hosted solutions and solutions that run in local virtual environments. This is the solution that we use for this class. Personally, I use jupyter lab. Probably the easiest way to get started with jupyter lab is to install Anaconda navigator. This will install a jupyter-lab icon that you can click to start. Also, you can play around with everything in this document with binder.

  • google collaboratory This is a cloud hosted solution that is about as easy as it gets and is really well done. Files can be stored on google drive. Really a wonderful way to get started in python and notebooks.

  • repl.it a cloud IDE and repl for many languages, including python. Great place to start learning python programming.

  • paperspace Paperspace has hosted jupyter notebooks where you can buy really powerful instances by the minute. It’s cost effective solution for running big jobs if you don’t have local computing resources. Unlike a lot of other cloud providers, they are very beginner friendly.

  • Anaconda this is a front end to the conda virtual environment. Anaconda installs many python IDEs, including Jupyter, Jupyter Labs, Spyder, PyCharm, and virtual environments.

  • emacs I love emacs. It’s an editor/IDE for lots of language. But has a steep learning curve. I use elpy in emacs (an add on). I’ll only mention Vi, another editor, to mention that it exists.

  • vscode A nice IDE for both notebooks and code development by Microsoft. It’s free and cross platform. I particularly like how well it integrates with Windows Services for Linux.

  • spyder and pycharm IDEs dedicated for python. I don’t use these, but I’ve played around with them and can see why they’re so popular. You can use them in Anaconda or install them natively.

  • atom, sublime, notepad++, eclipse, … these are editors and IDEs that are designed for many languages.

Notebooks#

Notebooks are going to be especially useful for us, as they’re a great way to do data analyses. With notebooks, you can merge richer documentation together with analysis code. You can take this to the extreme, and have solutions that create reproducible final documents. This book is an example, where the entire thing is written in jupyter-book. We’ll discuss this idea a little more when we discuss reproducible research. Alternatively, you can use your notebook as a working document that

Most notebook solutions have text blocks and code blocks. The text is marked up in a markup language called “Markdown”. You can find a guide to markdown syntax here: https://www.markdownguide.org/cheat-sheet/. It should take you very little time to learn markdown.

If you’re very new to notebooks in python, I would suggest starting with colab. The colab documentation is useful.