Combining Jupyter with SQL & R
This article explains how to set up Jupyter Notebook as a platform with MS SQL and R.
The steps that are described in the article are the result of my own experience and provide a complete explanation of the setup.
To follow this article, you should have basic knowledge of Jupyter Notebook, MS SQL and SSMS, and R (you can skip the R part if you want).
This article doesn’t explain how to use Jupyter notebook, SQL, or R.
If you are currently learning either Python, SQL, or R, then you must have already thought about how to combine them.
For example, while learning SQL, you have probably been faced with the need to import data from an excel or CSV file, and that data might have been invalidated, so maybe you first had to clean it manually using Excel.
Or maybe, you had a database in MSSQL (or another kind of SQL Server) and you want to practice Python on that database, but how can you connect to it with python?
There are many reasons to combine Python with SQL and R. One of them is improving the workflow. We can read datasets from files using python, validate that data, visualize it and also pass it to SQL to easily manipulate it. Or maybe we need to connect to our SQL server using python to create visualizations.
This is also a great way to show you skills in all three languages in Jupyter Notebook as the frontend for all three languages.
Here is how you can do it.
Time estimations:
Download and Install software
Let's begin with downloading all of the software, all of which can be found on google.
For all installations in this article, basic settings are sufficient (meaning you can press the next bottom until the installation is finished).
First, download R, R-Studio, and Rtools.
Start by installing R, follow by installing Rtools, and lastly install R-Studio.
open R-studio.
Go to: Tools --> Install packages --> search for “tidyverse” and click install.
You can close R-Studio.
Next, download MS SQL and SSMS. First, install MS SQL, then, install SSMS.
After installing both, open SSMS:
right click on your sql server connection and choose properties.
then, under security, choose server authentication of both sql server and windows authentication mode:
It is important to restart your computer to allow the authentication mode to apply
Follow the next steps to create a SQL Server user ID with SQL Server authentication:
Click OK.
Next, we are going to set up an API with Open Database Connectivity (ODBC) to enable a connection to MSSQL.
Search in the start menu “ODBC “ and make sure to choose 64 bit (or 32 bit, according to your system specifications).
the 32 bit or 64 bit is crucial, things wont work it you will use wrong bit transfer then the system specifications
Launch ODBC.
Click on System DSN --> add
Choose a name and add a description.
Enter the server pathway. You can find it by simply opening SSMS and copying the server pathway in the popup screen of the login authentication.
Click next
Choose SQL Server authentication, and enter the username and password of the new user you created in the previous step.
Click next --> next --> finish
Next, install Anaconda, and set up Jupyter and all of the required libraries.
Download and install Anaconda.
Open the “anaconda prompt”, you can find it using the search.
Recommended by LinkedIn
Set up the Jupyter Platform as a Frontend for using SQL and R
Then, run the following code:
conda install conda
conda create –-name Jupyter_SQL_R_env
conda activate Jupyter_SQL_R_env
conda install pandas jupyter bottleneck numexpr seaborn sqlalchemy sqlalchemy-utils pyodbc
conda update –-all
pip install rpy2
pip install ipython-sql
pip install pyodbca
Jupyter_SQL_R_env is a conda environment that we create, you can choose any different name instea
for this part, you should understand basic conda commands.
To activate the environment we’ve created.
conda activate Jupyter_SQL_R_env
Then, to open Jupyter Notebook.
Jupyter Notebook
After you finish working, terminate the server connection.
Ctrl + C several times to close all connections
open Jupyter Notebook and create a new notebook
Start by importing the libraries
# Import python librarie
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Import sql Extensions librarys
import sqlalchemy
from sqlalchemy_utils import database_exists, create_databases
import pyodbc
Then, establish a connection between Jupyter and MSSQL.
# If you followed all of the previous steps, you should have all of the details for the connection.
# To find out the Driver, you can go once again to ODBC from the search menu (once again, choose the right bit transfer version). Click on the Drivers tab and scroll down until SQL Server.
# note that the database can be an existing or a new non-existing database.
SERVER = "DESKTOP-NIR\SQLEXPRESS
DATABASE = "Article Database"
DRIVER = "SQL Server Native Client 11.0"
USERNAME = "ArticleUser"
PASSWORD = "ArticleUser123"
DATABASE_CONNECTION = f'mssql://{USERNAME}:{PASSWORD}@{SERVER}/{DATABASE}?driver={DRIVER}'
Our next commands are going to create a connection variable. If the database that was chosen above exists, it will connect to it, but if it doesn’t exist, it will create a new database.
engine = sqlalchemy.create_engine(DATABASE_CONNECTION
# if the database is does not exist, sqlalchemy will create the database
if not database_exists(DATABASE_CONNECTION):
create_database(DATABASE_CONNECTION)
else:
engine.connect()
# the connection command and variable
connection = engine.connect())
Load the extensions required to run SQL queries in jupyter.
%load_ext sq
%sql mssql+pyodbc://ArticleUser:ArticleUser123@mssql
#parameters: username= ArticleUser password= ArticleUser123
#parameters: after the “@” comes the ODBC name, in my case, the name is “mssql”l
Load the extensions required to run R commands in jupyter.
%load_ext rpy2.ipython
You have finished setting up Jupyter Notebook as a frontend platform for both SQL and R. Here's how it works.
Using SQL in Jupyter Notebook
By using %%sql inside a jupyter cell, the entire cell becomes a SQL cell, and we can write a SQL query as if we are in the SSMS.
Likewise, %sql is inline magic, which converts the line in the cell into a SQL line.
%%sql
SELECT * FROM Customers
That allows you to fully use DML, DDL, and DCL languages inside Jupyter.
Passing objects from SQL to Jupyter and vice versa
We can pass queries that get data from MSSQL using the connection variable.
In this example, we can now use the variable query_Table inside jupyter.
query = """
SELECT * FROM Table_Name
"""
query_Table = pd.read_sql_query(query, connection)"
Note how we use triple quotes """ for a multiline syntax
We have created a python variable that is defined by a SQL query, and we can use query_Table inside Jupyter.
Variables that are queried from SQL can also be used for visualizations, this is a great combo of SQL with python and R.
Passing data frames from python to MSSQL
We can also pass that same variable (query_Table), or any other data frame into a table in MSSQL.
query_Table.to_sql(name="Table_Name", schema="dbo", con=engine, if_exists="replace", index=False)
We can also use if_exists = append
engine is a variable that we have already defined to establish a connection to our database
Using R in Jupyter Notebook
Just like the %%sql and %sql magic, we can use R.
By using %%R inside a jupyter cell, the entire cell becomes an R cell, and we can use R syntax. Likewise, %R is inline magic, which converts the line in the cell into an R line.
To create inline visualizations, we need to add another syntax.
In this example, we create a frame of 40x15 cm for the visualization.
%%R -i Python_data_frame -w 40 -h 15 --units cm
# Write your code in here
Hope you enjoyed this article.
Nir Ben Eliahu
תודה רבה לך על השיתוף🙂 אני מזמין אותך לקבוצה שלי: הקבוצה מחברת בין ישראלים ואנשי העולם במגוון תחומים. https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU
תודה רבה על השיתוף! אני מזמין אותך לקבוצה שלי שמחברת בין ישראלים לשאר העולם במגוון נושאים מטרת הקבוצה לשתף מידע, לשאול שאלות וליצור שיתופי פעולה: https://chat.whatsapp.com/BubG8iFDe2bHHWkNYiboeU
Thanks for sharing Nir Ben Eliahu! Once I work more with Python I'll definitely give this a follow up :) I find connecting to the DB through R really useful at times. Makes it easy to update the data, run your analysis quickly and maintain reproducibility when sharing/returning to your work. And the bonus -- Parameterized reports by passing values to your queries 🤯