Combining Jupyter with SQL & R
28.01.2022

Combining Jupyter with SQL & R

This article explains how to set up Jupyter Notebook as a platform with MS SQL and R.

The steps that are described in the article are the result of my own experience and provide a complete explanation of the setup.

To follow this article, you should have basic knowledge of Jupyter Notebook, MS SQL and SSMS, and R (you can skip the R part if you want).

This article doesn’t explain how to use Jupyter notebook, SQL, or R.


If you are currently learning either Python, SQL, or R, then you must have already thought about how to combine them.

For example, while learning SQL, you have probably been faced with the need to import data from an excel or CSV file, and that data might have been invalidated, so maybe you first had to clean it manually using Excel.

Or maybe, you had a database in MSSQL (or another kind of SQL Server) and you want to practice Python on that database, but how can you connect to it with python?

There are many reasons to combine Python with SQL and R. One of them is improving the workflow. We can read datasets from files using python, validate that data, visualize it and also pass it to SQL to easily manipulate it. Or maybe we need to connect to our SQL server using python to create visualizations.

This is also a great way to show you skills in all three languages in Jupyter Notebook as the frontend for all three languages.


Here is how you can do it.

Time estimations:

  • Download software and installations: 1-2 hours.
  • Setup the environment: 1-2 hours.
  • And then: Punch in the commands in Jupyter notebook and see how it works.


Download and Install software

Let's begin with downloading all of the software, all of which can be found on google.

For all installations in this article, basic settings are sufficient (meaning you can press the next bottom until the installation is finished).


First, download R, R-Studio, and Rtools.

Start by installing R, follow by installing Rtools, and lastly install R-Studio.

open R-studio.

Go to: Tools --> Install packages --> search for “tidyverse” and click install.

No alt text provided for this image

You can close R-Studio.


Next, download MS SQL and SSMS. First, install MS SQL, then, install SSMS.

After installing both, open SSMS:

right click on your sql server connection and choose properties.

No alt text provided for this image

then, under security, choose server authentication of both sql server and windows authentication mode:

No alt text provided for this image


It is important to restart your computer to allow the authentication mode to apply


Follow the next steps to create a SQL Server user ID with SQL Server authentication:

The following steps are from ibm.com

  1. In the SQL Server Management Studio, open Object Explorer.
  2. Click Server_instance_name > Security > Logins.
  3. Right-click Logins and select New Login.
  4. On the General page, in the Login name field, type the name for a new user.
  5. Select SQL Server authentication.
  6. In the Password field, type a password for the user.
  7. In the Confirm Password field, retype the password that you entered in the Password field.
  8. Depending on the role and permissions that you want to assign to this user, complete one of the following tasks:

  • On the Server Roles page, assign the sysadmin role to the new login ID.
  • If you do not want to assign the sysadmin role to the user, grant minimum permissions to the user by completing the steps that are mentioned in Granting minimum permissions for data collection.

Click OK.


Next, we are going to set up an API with Open Database Connectivity (ODBC) to enable a connection to MSSQL.

Search in the start menu “ODBC “ and make sure to choose 64 bit (or 32 bit, according to your system specifications).

the 32 bit or 64 bit is crucial, things wont work it you will use wrong bit transfer then the system specifications


Launch ODBC.

Click on System DSN --> add

Choose a name and add a description.

Enter the server pathway. You can find it by simply opening SSMS and copying the server pathway in the popup screen of the login authentication. 

No alt text provided for this image
No alt text provided for this image


Click next

Choose SQL Server authentication, and enter the username and password of the new user you created in the previous step.

No alt text provided for this image

Click next --> next --> finish


Next, install Anaconda, and set up Jupyter and all of the required libraries.

Download and install Anaconda.

Open the “anaconda prompt”, you can find it using the search.

No alt text provided for this image


Set up the Jupyter Platform as a Frontend for using SQL and R

Then, run the following code:

conda install conda

conda create –-name Jupyter_SQL_R_env

conda activate Jupyter_SQL_R_env

conda install pandas jupyter bottleneck numexpr seaborn sqlalchemy sqlalchemy-utils pyodbc

conda update –-all

pip install rpy2

pip install ipython-sql

pip install pyodbca
        


Jupyter_SQL_R_env is a conda environment that we create, you can choose any different name instea


for this part, you should understand basic conda commands.

To activate the environment we’ve created.

conda activate Jupyter_SQL_R_env        

Then, to open Jupyter Notebook.

Jupyter Notebook        

After you finish working, terminate the server connection.

Ctrl + C several times to close all connections        


open Jupyter Notebook and create a new notebook

Start by importing the libraries

# Import python librarie

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

# Import sql Extensions librarys

import sqlalchemy

from sqlalchemy_utils import database_exists, create_databases

import pyodbc        

Then, establish a connection between Jupyter and MSSQL.

# If you followed all of the previous steps, you should have all of the details for the connection.

# To find out the Driver, you can go once again to ODBC from the search menu (once again, choose the right bit transfer version). Click on the Drivers tab and scroll down until SQL Server.

No alt text provided for this image

# note that the database can be an existing or a new non-existing database.

SERVER = "DESKTOP-NIR\SQLEXPRESS

DATABASE = "Article Database"

DRIVER = "SQL Server Native Client 11.0"

USERNAME = "ArticleUser"

PASSWORD = "ArticleUser123"

DATABASE_CONNECTION = f'mssql://{USERNAME}:{PASSWORD}@{SERVER}/{DATABASE}?driver={DRIVER}'        

Our next commands are going to create a connection variable. If the database that was chosen above exists, it will connect to it, but if it doesn’t exist, it will create a new database.

engine = sqlalchemy.create_engine(DATABASE_CONNECTION

 

# if the database is does not exist, sqlalchemy will create the database

if not database_exists(DATABASE_CONNECTION):

    create_database(DATABASE_CONNECTION)

else:

    engine.connect()

 

# the connection command and variable

connection = engine.connect())        

Load the extensions required to run SQL queries in jupyter.

%load_ext sq

%sql mssql+pyodbc://ArticleUser:ArticleUser123@mssql

#parameters: username= ArticleUser password= ArticleUser123

#parameters: after the “@” comes the ODBC name, in my case, the name is “mssql”l        

Load the extensions required to run R commands in jupyter.

%load_ext rpy2.ipython        


You have finished setting up Jupyter Notebook as a frontend platform for both SQL and R. Here's how it works.


Using SQL in Jupyter Notebook

By using %%sql inside a jupyter cell, the entire cell becomes a SQL cell, and we can write a SQL query as if we are in the SSMS.

Likewise, %sql is inline magic, which converts the line in the cell into a SQL line.

%%sql

SELECT * FROM Customers        

That allows you to fully use DML, DDL, and DCL languages inside Jupyter.


Passing objects from SQL to Jupyter and vice versa

We can pass queries that get data from MSSQL using the connection variable.

In this example, we can now use the variable query_Table inside jupyter.

query = """

 

SELECT * FROM Table_Name

 

"""

 

query_Table = pd.read_sql_query(query, connection)"        
Note how we use triple quotes """ for a multiline syntax

We have created a python variable that is defined by a SQL query, and we can use query_Table inside Jupyter.

Variables that are queried from SQL can also be used for visualizations, this is a great combo of SQL with python and R.


Passing data frames from python to MSSQL

We can also pass that same variable (query_Table), or any other data frame into a table in MSSQL.

query_Table.to_sql(name="Table_Name", schema="dbo", con=engine, if_exists="replace", index=False)        

We can also use if_exists = append

engine is a variable that we have already defined to establish a connection to our database


Using R in Jupyter Notebook

Just like the %%sql and %sql magic, we can use R.

By using %%R inside a jupyter cell, the entire cell becomes an R cell, and we can use R syntax. Likewise, %R is inline magic, which converts the line in the cell into an R line.

To create inline visualizations, we need to add another syntax.

In this example, we create a frame of 40x15 cm for the visualization.

%%R -i Python_data_frame -w 40 -h 15 --units cm

# Write your code in here        


Hope you enjoyed this article.

Nir Ben Eliahu

תודה רבה לך על השיתוף🙂 אני מזמין אותך לקבוצה שלי: הקבוצה מחברת בין ישראלים ואנשי העולם במגוון תחומים. https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU

Like
Reply

תודה רבה על השיתוף! אני מזמין אותך לקבוצה שלי שמחברת בין ישראלים לשאר העולם במגוון נושאים מטרת הקבוצה לשתף מידע, לשאול שאלות וליצור שיתופי פעולה: https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU

Like
Reply

Thanks for sharing Nir Ben Eliahu! Once I work more with Python I'll definitely give this a follow up :) I find connecting to the DB through R really useful at times. Makes it easy to update the data, run your analysis quickly and maintain reproducibility when sharing/returning to your work. And the bonus -- Parameterized reports by passing values to your queries 🤯

Like
Reply

To view or add a comment, sign in

More articles by Nir Ben Eliahu

Others also viewed

Explore content categories