Creating a sqlite database from CSVs with Python

This blog post demonstrates how to build a sqlite database from CSV files.

Python is perfect language for this task because it has great libraries for sqlite and CSV DataFrames.

Creating a sqlite database

sqlite is a lightweight database that can be started as an empty text file. You can create the file with touch my_data.db or with this equivalent Python code:

from pathlib import Path

A zero byte text file is a great starting point for a lightweight database!

Creating sqlite table

Create a database connection and cursor to execute queries.

import sqlite3

conn = sqlite3.connect('my_data.db')
c = conn.cursor()

Execute a query that’ll create a users table with user_id and username columns.

c.execute('''CREATE TABLE users (user_id int, username text)''')

Load CSV file into sqlite table

Suppose you have the following users.csv file:


Pandas makes it easy to load this CSV data into a sqlite table:

import pandas as pd

# load the data into a Pandas DataFrame
users = pd.read_csv('users.csv')
# write the data to a sqlite table
users.to_sql('users', conn, if_exists='append', index = False)

The to_sql method makes it easy to write DataFrames to databases.

Fetch values from sqlite table

Fetch all the rows from the users table:

c.execute('''SELECT * FROM users''').fetchall() # [(1, 'pokerkid'), (2, 'crazyken')]

The fetchall() method returns an array of tuples.

c.execute() returns a sqlite3.Cursor object. Cursors can be thought of as iterators in the database world.

Load another CSV into the databases

Suppose you have the following orders.csv file:


Create a table and then load the orders data into the database.

c.execute('''CREATE TABLE orders (order_id int, user_id int, item_name text)''')
orders = pd.read_csv('orders.csv') # load to DataFrame
orders.to_sql('orders', conn, if_exists='append', index = False) # write to sqlite table

Fetch results of database join

Join the users and orders tables on the user_id value and print the results:

c.execute('''SELECT * FROM users u LEFT JOIN orders o ON u.user_id = o.user_id''')

Here’s the array that’s returned:

[(1, 'pokerkid', 1, 1, 'speaker'),
 (1, 'pokerkid', 2, 1, 'phone'),
 (2, 'crazyken', 3, 2, 'spoon'),
 (2, 'crazyken', 4, 2, 'fork'),
 (2, 'crazyken', 5, 2, 'speaker')]

You can also read the SQL query directly into a Pandas DataFrame.

pd.read_sql('''SELECT * FROM users u LEFT JOIN orders o ON u.user_id = o.user_id''', conn)

Next steps

Python’s build in sqlite library coupled with Pandas DataFrames makes it easy to load CSV data into sqlite databases.

sqlite databases are great for local experimentation and are used extensively on mobile phones. It’s a great database when you’d like relational database query functionality without the overhead of Postgres.

Python’s great support for sqlite will make you love it in no time.

Leave a Reply

Your email address will not be published. Required fields are marked *