Python |
||
Panda - SQL
Panda DataFrame itself provide many powerful tools for data manipulation and simple to use, but depending on the skillset which you are more familiar with, you might have thought 'it would be good if I can use sql to the panda table'. This page is to show you how to convert the panda table (DataFrame) to sql database and manipulate the data using SQL.
Installation of SQL engine
First, you need to install a sql engine and that can be installed as followed. (At the time of writing this note (Jul 2020), I was using Python 3.7.5 and the installed sqlalchemy version is shown below).
C:\>pip install sqlalchemy
Collecting sqlalchemy Downloading SQLAlchemy-1.3.18-cp37-cp37m-win_amd64.whl (1.2 MB) |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 1.2 MB 1.6 MB/s Installing collected packages: sqlalchemy Successfully installed sqlalchemy-1.3.18
Connection to SQL engine
Following is a short example that connect to an sql engine and convert a panda table to the sql table, retrieve data from the table.
import sys import pandas as pd from sqlalchemy import create_engine
df = pd.read_csv('Sacramentorealestatetransactions.csv')
sqlEngine = create_engine('sqlite://', echo=False) df.to_sql('TransactionDB', con=sqlEngine, if_exists='replace')
sqlList = sqlEngine.execute("SELECT * FROM TransactionDB").fetchall();
In this example, I am creating a panda table by reading an existing csv file by pd.read_csv('Sacramentorealestatetransactions.csv'). You can get this csv file as described in this page. For the simplicity, you can create your own / simple table by yourself if you like.
sqlEngine = create_engine('sqlite://', echo=False) : this create an SQL engine and name it as 'sqlEngine'. From follow on, you will get access to the SQL engine using this name.
df.to_sql('TransactionDB', con=sqlEngine, if_exists='replace') : this line mean 'convert the panda table named 'df' into an SQL table named 'TransactionDB'. if_exists='replace' mean 'if there is already a table named 'TransactionDB', replace the table with this new data.
sqlList = sqlEngine.execute("SELECT * FROM TransactionDB").fetchall() : this line mean 'execute the specified SQL query and save it to the variable 'sqlList'. As you may guess here, the queried SQL result is returned as a list.
Basic Check
Once the above code runs without any error, I would suggest a few basic checkups to see everything is done as intended.
First check if the csv file is successfully and the data is stored to the variable 'df'.
>>> df.head()
street city zip ... price latitude longitude 0 3526 HIGH ST SACRAMENTO 95838 ... 59222 38.631913 -121.434879 1 51 OMAHA CT SACRAMENTO 95823 ... 68212 38.478902 -121.431028 2 2796 BRANCH ST SACRAMENTO 95815 ... 68880 38.618305 -121.443839 3 2805 JANETTE WAY SACRAMENTO 95815 ... 69307 38.616835 -121.439146 4 6001 MCMAHON DR SACRAMENTO 95824 ... 81900 38.519470 -121.435768
[5 rows x 12 columns]
Now check if the sql command is properly executed by sqlList = sqlEngine.execute("SELECT * FROM TransactionDB").fetchall();
>>> sqlList
Squeezed Text(1832 lines)
As shown here, the data would not be printed in case the size of the data is too much. You can expand the data or just print out one element just to make it sure it worked as shown below.
>>> sqlList[0]
(0, '3526 HIGH ST', 'SACRAMENTO', 95838, 'CA', 2, 1, 836, 'Residential', 'Wed May 21 00:00:00 EDT 2008', 59222, 38.631913, -121.43487900000001)
If you like, you can convert the SQL result (a List) back to panda DataFrame and process it further using Panda DataFrame functionality explained in this page.
>>> sqldf = pd.DataFrame(sqlList) >>> sqldf.head()
0 1 2 ... 10 11 12 0 0 3526 HIGH ST SACRAMENTO ... 59222 38.631913 -121.434879 1 1 51 OMAHA CT SACRAMENTO ... 68212 38.478902 -121.431028 2 2 2796 BRANCH ST SACRAMENTO ... 68880 38.618305 -121.443839 3 3 2805 JANETTE WAY SACRAMENTO ... 69307 38.616835 -121.439146 4 4 6001 MCMAHON DR SACRAMENTO ... 81900 38.519470 -121.435768
[5 rows x 13 columns]
|
||