Inserting Pandas DataFrames into a Database Using to_sql() Function

Kumar Brar

Published Oct 23, 2019

In this article, we are going to understand how we can make use of the module sqlalchemy to create our connection and the to_sql() function to insert our Pandas Dataframe into MySQL database.

The advantage of using this technique over the technique discussed in the previous article is that we can add a whole dataframe to a MySQL database all at once.

Note: The dataframe that we are using is the same 'movies' dataframe used in previous article.

import pandas as pd

movies = pd.read_csv('http://bit.ly/imdbratings')  ## Reading a Dataframe

movies.head(5) ##Displays only the top 5 rows of the Dataframe

Step 1 :

First, we need to Import the module sqlalchemy and create an engine with the parameters : user, password, and database name. This is how we connect and log in to the MySQL database.

# import the module
from sqlalchemy import create_engine

# create sqlalchemy engine
engine = create_engine("mysql+pymysql://{user}:{pwd}@localhost/{db}"
                       .format(user="username",
                               pwd="*****",

                               
                               db="movies"))

Step 2 :

Once we’re connected, we can export the entire DataFrame to MySQL using the to_sql() function with the parameters : table name, engine name, if_exists, and chunksize.

# Insert entire DataFrame into MySQL

data.to_sql('movies_details', con = engine, if_exists = 'append', chunksize = 1000)

Now, let’s take a closer look at what each of these parameters is doing in our code.

movies_details is the name of table into which we want to insert our DataFrame.
con = engine provides the connection details (recall that we created engine using our authentication details in Step 1).
if_exists = 'append' checks whether the table we specified already exists or not, and then appends the new data (if it does exist) or creates a new table (if it doesn’t).
chunksize writes records in batches of a given size at a time. By default, all rows will be written at once.

Hope this article will help to efficiently connect and add Pandas Dataframe into database.

To view or add a comment, sign in

Inserting Pandas DataFrames into a Database Using to_sql() Function

Kumar Brar

More articles by Kumar Brar

Explore content categories

More articles by Kumar Brar

Part 5 -Bitwise AND operation to mask useless area in image

Part 4 - Region of Interest

Part 3-Canny function for Edge Detection

Part 2 -Remove noise in an image using Gaussian Blur

Part 1-Lane detection using OpenCV

Understand Conda and Pip

Definitely, we can minimize road-accidents.

Can we minimize road accidents?

Inserting Pandas DataFrames Into Database using INSERT

Inserting Records Into a Database From Python

Explore content categories