Python

 

 

 

 

Panda - SQL

 

Panda DataFrame itself provide many powerful tools for data manipulation and simple to use, but depending on the skillset which you are more familiar with, you might have thought 'it would be good if I can use sql to the panda table'. This page is to show you how to convert the panda table (DataFrame) to sql database and manipulate the data using SQL.

 

 

 

Installation of SQL engine

 

First, you need to install a sql engine and that can be installed as followed. (At the time of writing this note (Jul 2020), I was using Python 3.7.5 and the installed sqlalchemy version is shown below).   

 

C:\>pip install sqlalchemy

 

Collecting sqlalchemy

  Downloading SQLAlchemy-1.3.18-cp37-cp37m-win_amd64.whl (1.2 MB)

     |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 1.2 MB 1.6 MB/s

Installing collected packages: sqlalchemy

Successfully installed sqlalchemy-1.3.18

 

 

 

Connection to SQL engine

 

Following is a short example that connect to an sql engine and convert a panda table to the sql table, retrieve data from the table.

 

    import sys

    import pandas as pd

    from sqlalchemy import create_engine

     

    df = pd.read_csv('Sacramentorealestatetransactions.csv')

     

    sqlEngine = create_engine('sqlite://', echo=False)

    df.to_sql('TransactionDB', con=sqlEngine, if_exists='replace')

     

    sqlList = sqlEngine.execute("SELECT * FROM TransactionDB").fetchall();

 

In this example, I am creating a panda table by reading an existing csv file by pd.read_csv('Sacramentorealestatetransactions.csv'). You can get this csv file as described in this page. For the simplicity, you can create your own / simple table by yourself if you like.

 

sqlEngine = create_engine('sqlite://', echo=False) : this create an SQL engine and name it as 'sqlEngine'. From follow on, you will get access to the SQL engine using this name.

 

df.to_sql('TransactionDB', con=sqlEngine, if_exists='replace') : this line mean 'convert the panda table named 'df' into an SQL table named 'TransactionDB'.  if_exists='replace' mean 'if there is already a table named 'TransactionDB', replace the table with this new data.

 

sqlList = sqlEngine.execute("SELECT * FROM TransactionDB").fetchall() : this line mean 'execute the specified SQL query and save it to the variable 'sqlList'. As you may guess here, the queried SQL result is returned as a list.

 

 

 

Basic Check

 

Once the above code runs without any error, I would suggest a few basic checkups to see everything is done as intended.

 

First check if the csv file is successfully and the data is stored to the variable 'df'.

 

>>> df.head()

 

             street        city    zip  ...  price   latitude   longitude

0      3526 HIGH ST  SACRAMENTO  95838  ...  59222  38.631913 -121.434879

1       51 OMAHA CT  SACRAMENTO  95823  ...  68212  38.478902 -121.431028

2    2796 BRANCH ST  SACRAMENTO  95815  ...  68880  38.618305 -121.443839

3  2805 JANETTE WAY  SACRAMENTO  95815  ...  69307  38.616835 -121.439146

4   6001 MCMAHON DR  SACRAMENTO  95824  ...  81900  38.519470 -121.435768

 

[5 rows x 12 columns]

 

 

Now check if the sql command is properly executed by sqlList = sqlEngine.execute("SELECT * FROM TransactionDB").fetchall();

 

>>> sqlList

 

Squeezed Text(1832 lines)

 

As shown here, the data would not be printed in case the size of the data is too much. You can expand the data or just print out one element just to make it sure it worked as shown below.

 

>>> sqlList[0]

 

(0, '3526 HIGH ST', 'SACRAMENTO', 95838, 'CA', 2, 1, 836, 'Residential', 'Wed May 21 00:00:00 EDT 2008', 59222, 38.631913, -121.43487900000001)

 

 

If you like, you can convert the SQL result (a List) back to panda DataFrame and process it further using Panda DataFrame functionality explained in this page.

 

>>> sqldf = pd.DataFrame(sqlList)

>>> sqldf.head()

 

   0                 1           2   ...     10         11          12

0   0      3526 HIGH ST  SACRAMENTO  ...  59222  38.631913 -121.434879

1   1       51 OMAHA CT  SACRAMENTO  ...  68212  38.478902 -121.431028

2   2    2796 BRANCH ST  SACRAMENTO  ...  68880  38.618305 -121.443839

3   3  2805 JANETTE WAY  SACRAMENTO  ...  69307  38.616835 -121.439146

4   4   6001 MCMAHON DR  SACRAMENTO  ...  81900  38.519470 -121.435768

 

[5 rows x 13 columns]