Inserting Pandas DataFrames into a Database Using to_sql() Function
In this article, we are going to understand how we can make use of the module sqlalchemy to create our connection and the to_sql() function to insert our Pandas Dataframe into MySQL database.
The advantage of using this technique over the technique discussed in the previous article is that we can add a whole dataframe to a MySQL database all at once.
Note: The dataframe that we are using is the same 'movies' dataframe used in previous article.
import pandas as pd
movies = pd.read_csv('http://bit.ly/imdbratings') ## Reading a Dataframe
movies.head(5) ##Displays only the top 5 rows of the Dataframe
Step 1 :
First, we need to Import the module sqlalchemy and create an engine with the parameters : user, password, and database name. This is how we connect and log in to the MySQL database.
# import the module
from sqlalchemy import create_engine
# create sqlalchemy engine
engine = create_engine("mysql+pymysql://{user}:{pwd}@localhost/{db}"
.format(user="username",
pwd="*****",
db="movies"))
Step 2 :
Once we’re connected, we can export the entire DataFrame to MySQL using the to_sql() function with the parameters : table name, engine name, if_exists, and chunksize.
# Insert entire DataFrame into MySQL
data.to_sql('movies_details', con = engine, if_exists = 'append', chunksize = 1000)
Now, let’s take a closer look at what each of these parameters is doing in our code.
- movies_details is the name of table into which we want to insert our DataFrame.
- con = engine provides the connection details (recall that we created engine using our authentication details in Step 1).
- if_exists = 'append' checks whether the table we specified already exists or not, and then appends the new data (if it does exist) or creates a new table (if it doesn’t).
- chunksize writes records in batches of a given size at a time. By default, all rows will be written at once.
Hope this article will help to efficiently connect and add Pandas Dataframe into database.