Data Modelling, SQL , and Python. Tying it all together
Photo by Christina Morillo from Pexels

Data Modelling, SQL , and Python. Tying it all together

Business application, data integration, data warehousing, big data, data lake, and machine learning. All of them share a vital part which is all of them have a data model. All product that entangled with data needs to have a solid data model. Why ? because otherwise the user do not know where is the data coming and going  also on how the data connect to each other. A data model is essential in designing database.

 

Data model is a conceptual representation of data object. Including association between different data object and rules governing how the data object behaves. Data modelling is a process to create a data model in a visual form and make sure the data model is in line with business logic, regulatory compliance , and government policies. There is three types of data model which is :

1.      Conceptual data model :

This is used define and organize informations that will be needed to develop a system. It is used by business stakeholder to communicate business structure and concepts also what the system will contains.

2.      Logical data model :

This is used to define type, attributes, and relationships of the entities within the sysem. It is used by data architect and business analysts to communicate how the system should be implemented without taking any DBMS into consideration.

3.      Physical data model

This is used to define the implementation of logical data model in a particular DBMS. It is used by database developers and analysts to enact the ideas expressed in conceptual and logical data model.

 

The types of data model differs on what their purposes, who uses it, and the level of detail in them.The level of detail is increased from conceptual, logical, and physical with the most detailed. Using all of them will give a more complete understanding of the system concept and business requirement. Below is an example of the three type of data model.

No alt text provided for this image

sources : Sri Pakash, types of data model

There are two types of data modelling technique :

1.      Entity Relationship Model (E-R)

2.      Unified Modelling Language (UML)

To complete my assignment from IYKRA about data modelling, I used the ER model to create an Entity Relationship Diargam(ERD) of physical data model of dvdrental database. The database used in this assignment is a postgre database about DVD rental.

 

No alt text provided for this image


Here is the rules I defined for the data objects :

A. 1 customer can only registered at 1 store

B. 1 customer have to at least have 1 payment but they can have many payment as well

C. 1 customer have to at least do 1 rental but they can have many rental as well  

D. 1 payment can consist of a several rental, but 1 rental can not be divided into a several payment

E. 1 staff can handle no rental at all or can do up to many rental

F. 1 staff can handle no payment at all or can do up to many payment

G. 1 staff can only be placed in 1 store

H. 1 rental can access many inventory in 1 store and 1 inventory in 1 store can supply many rental.

I. 1 store can have 0 or many inventory, but 1 inventory can not be stored in many store.

J. 1 film can be in many inventory in many store. But 1 inventory can not contain many type of film

K. 1 film can only exist in 1 film_category and 1 film_category can only represent 1 film.

L. 1 film can only exist in 1film_actor and 1 film_actor can only represent 1 film.

M. 1 film can not contain many language, but 1 language can exist in many film.

For the implementation of the data modelling, and to complete an assignment from IYKRA about python and SQL, i use PostgreSQL database that connected to python using library psycopg2. The steps I take to implement it in python are :

1.      Create the dvdrental database in PostgreSQL.

2.      Create the tables according to data model in the database.

3.      Insert the values into the corresponding columns and tables.

4.      Import the necessary library into python.

5.      Connect into the PostgreSQL dvdrental database.

6.      Create a cursor object.

7.      Execute a join SQL queries in python.

8.      Check if it is working according to expectation.

9.      Close the connection to database.

 

Here is a snippet of my code that i use in the assignments :

#import library to connect to postgresql

import psycopg2

 

#connect to dvdrental db

conn = psycopg2.connect(

    host="127.0.0.1", #localhost

    port="5432",

    dbname="dvdrental",

    user="postgres",

    password="postgres")

 

print("Opened database successfully")

 

#create cursor object

cur = conn.cursor()

 

#view name, email, and the total amount of transaction of the 5 top customer

cur.execute('''

SELECT first_name, last_name, email, SUM(amount) AS spend_amount

FROM customer AS c

INNER JOIN payment AS p

ON c.customer_id = p.customer_id

GROUP BY first_name,last_name,email

ORDER BY spend_amount DESC

LIMIT 5;''')


#print the result of the query

rows = cur.fetchall()

for row in rows:

   print("date = ", row[0])

   print("sale = ", row[1],"\n")

 

print("Operation done successfully")

#close connection
        
conn.close()        

 

For the full code, SQL queries, and the database files feel free to check out my github repositories : https://github.com/hpraditya/sql-dvdrental.git . I am still learning on this topic, but if you have any question or want to have a discussion, feel free to contact me through my email praditya.humam@gmail.com or message me through my Linkedin.

 

References :

2017, Dale Anderson, Data Model Design and Best Practices – Part 1, https://www.talend.com/blog/2017/05/05/data-model-design-best-practices-part-1/

2021, Robert Agar, Stages and Types of Data Models, https://tdan.com/stages-and-types-of-data-models/28201

Keren Mas Dito

Like
Reply

To view or add a comment, sign in

More articles by Praditya Humam

  • Pandas Dataframe vs Spark Dataframe

    One of python main attractive point is its vast amount of choice for libraries and resource from the open-source…

  • Choosing SQL or NoSQL

    When building an application, one of important decision to take is choosing database to use. As the wrong selection of…

  • Exploring Services From Google Cloud Platform

    Google Cloud Platform (GCP) is one of the largest cloud provider compueting with AWS from Amazon and Ms. Azure from…

  • Estimating ETA with Machine Learning : a Supervised Learning Case

    In the era of on-demand ecosystem where brand-customer relationship is increasingly important, a promise on when the…

  • Designing ETL Pipeline Principle

    Data engineering task nowadays is not only consist of moving data from one source to another or just creating and…

  • Introduction to Orchestration

    To explain orchestration it is best to use a musical analogy, where it borrows the term from. Orchestration software…

  • Computer Vision for Image Classification

    Computer vision is a field within artificial intelligence(AI) that thrive to make computers to be able to look at image…

  • Text Classification Summary

    Text classification is one of basic task in Natural Language Processing(NLP) that have many application, such as…

  • Summary of Tableau Level of Detail (LOD)

    On Tableau the term level of detail is describing how detail your current visualization view. Level of detail (LOD) is…

  • Summary of Intro to Natural Language Processing & Text Mining

    One of the branch of computer science is Artificial Inteligence (AI), and one of the branch of AI is something called…

Others also viewed

Explore content categories