sqlite in python#
An sqlite3 library ships with python. In this tutorial, we’ll discuss how to utilize this library and read sqlite tables into pandas. With this, you can generalize to other python APIs to other databases. First, let’s continue on with our work from the previous notebook. A nice little tutorial can be found here.
import sqlite3 as sq3
import pandas as pd
##Download the database using the unix command wget
!wget https://raw.githubusercontent.com/smart-stats/ds4bio_book/main/book/sql/opioid.db
## Create the connection
con = sq3.connect("opioid.db")
## cursor() creates an object that can execute functions in the sqlite cursor
sql = con.cursor()
## execute a command, loop through the output and print out
for row in sql.execute("select * from county_info limit 5;"):
print(row)
# you have to close the connection
con.close
--2024-04-15 18:53:36-- https://raw.githubusercontent.com/smart-stats/ds4bio_book/main/book/sql/opioid.db
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7774208 (7.4M) [application/octet-stream]
Saving to: ‘opioid.db’
opioid.db 100%[===================>] 7.41M 21.5MB/s in 0.3s
2024-04-15 18:53:36 (21.5 MB/s) - ‘opioid.db’ saved [7774208/7774208]
('1', 'AUTAUGA', 'AL', '01001', '1', '1', 'Autauga', 'Autauga County, Alabama', 'B01003_001', '2006', '51328', 'Autauga, AL', '594.44')
('2', 'BALDWIN', 'AL', '01003', '1', '3', 'Baldwin', 'Baldwin County, Alabama', 'B01003_001', '2006', '168121', 'Baldwin, AL', '1589.78')
('3', 'BARBOUR', 'AL', '01005', '1', '5', 'Barbour', 'Barbour County, Alabama', 'B01003_001', '2006', '27861', 'Barbour, AL', '884.88')
('4', 'BIBB', 'AL', '01007', '1', '7', 'Bibb', 'Bibb County, Alabama', 'B01003_001', '2006', '22099', 'Bibb, AL', '622.58')
('5', 'BLOUNT', 'AL', '01009', '1', '9', 'Blount', 'Blount County, Alabama', 'B01003_001', '2006', '55485', 'Blount, AL', '644.78')
<function Connection.close()>
Let’s read this dataset into pandas.
con = sq3.connect("opioid.db")
county_info = pd.read_sql_query("SELECT * from county_info", con)
# you have to close the connection
con.close
county_info.head()
BUYER_COUNTY | BUYER_STATE | countyfips | STATE | COUNTY | county_name | NAME | variable | year | population | Areaname | LND110210D | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | AUTAUGA | AL | 01001 | 1 | 1 | Autauga | Autauga County, Alabama | B01003_001 | 2006 | 51328 | Autauga, AL | 594.44 |
1 | 2 | BALDWIN | AL | 01003 | 1 | 3 | Baldwin | Baldwin County, Alabama | B01003_001 | 2006 | 168121 | Baldwin, AL | 1589.78 |
2 | 3 | BARBOUR | AL | 01005 | 1 | 5 | Barbour | Barbour County, Alabama | B01003_001 | 2006 | 27861 | Barbour, AL | 884.88 |
3 | 4 | BIBB | AL | 01007 | 1 | 7 | Bibb | Bibb County, Alabama | B01003_001 | 2006 | 22099 | Bibb, AL | 622.58 |
4 | 5 | BLOUNT | AL | 01009 | 1 | 9 | Blount | Blount County, Alabama | B01003_001 | 2006 | 55485 | Blount, AL | 644.78 |
This should get you started on working with SQL in python. Personally, I find using python easier than directly using the SQL, so the basic approach in this section is how I would suggest that you operate.