Data Modelling, SQL , and Python. Tying it all together
Business application, data integration, data warehousing, big data, data lake, and machine learning. All of them share a vital part which is all of them have a data model. All product that entangled with data needs to have a solid data model. Why ? because otherwise the user do not know where is the data coming and going also on how the data connect to each other. A data model is essential in designing database.
Data model is a conceptual representation of data object. Including association between different data object and rules governing how the data object behaves. Data modelling is a process to create a data model in a visual form and make sure the data model is in line with business logic, regulatory compliance , and government policies. There is three types of data model which is :
1. Conceptual data model :
This is used define and organize informations that will be needed to develop a system. It is used by business stakeholder to communicate business structure and concepts also what the system will contains.
2. Logical data model :
This is used to define type, attributes, and relationships of the entities within the sysem. It is used by data architect and business analysts to communicate how the system should be implemented without taking any DBMS into consideration.
3. Physical data model
This is used to define the implementation of logical data model in a particular DBMS. It is used by database developers and analysts to enact the ideas expressed in conceptual and logical data model.
The types of data model differs on what their purposes, who uses it, and the level of detail in them.The level of detail is increased from conceptual, logical, and physical with the most detailed. Using all of them will give a more complete understanding of the system concept and business requirement. Below is an example of the three type of data model.
sources : Sri Pakash, types of data model
There are two types of data modelling technique :
1. Entity Relationship Model (E-R)
2. Unified Modelling Language (UML)
To complete my assignment from IYKRA about data modelling, I used the ER model to create an Entity Relationship Diargam(ERD) of physical data model of dvdrental database. The database used in this assignment is a postgre database about DVD rental.
Here is the rules I defined for the data objects :
A. 1 customer can only registered at 1 store
B. 1 customer have to at least have 1 payment but they can have many payment as well
C. 1 customer have to at least do 1 rental but they can have many rental as well
D. 1 payment can consist of a several rental, but 1 rental can not be divided into a several payment
E. 1 staff can handle no rental at all or can do up to many rental
F. 1 staff can handle no payment at all or can do up to many payment
Recommended by LinkedIn
G. 1 staff can only be placed in 1 store
H. 1 rental can access many inventory in 1 store and 1 inventory in 1 store can supply many rental.
I. 1 store can have 0 or many inventory, but 1 inventory can not be stored in many store.
J. 1 film can be in many inventory in many store. But 1 inventory can not contain many type of film
K. 1 film can only exist in 1 film_category and 1 film_category can only represent 1 film.
L. 1 film can only exist in 1film_actor and 1 film_actor can only represent 1 film.
M. 1 film can not contain many language, but 1 language can exist in many film.
For the implementation of the data modelling, and to complete an assignment from IYKRA about python and SQL, i use PostgreSQL database that connected to python using library psycopg2. The steps I take to implement it in python are :
1. Create the dvdrental database in PostgreSQL.
2. Create the tables according to data model in the database.
3. Insert the values into the corresponding columns and tables.
4. Import the necessary library into python.
5. Connect into the PostgreSQL dvdrental database.
6. Create a cursor object.
7. Execute a join SQL queries in python.
8. Check if it is working according to expectation.
9. Close the connection to database.
Here is a snippet of my code that i use in the assignments :
#import library to connect to postgresql
import psycopg2
#connect to dvdrental db
conn = psycopg2.connect(
host="127.0.0.1", #localhost
port="5432",
dbname="dvdrental",
user="postgres",
password="postgres")
print("Opened database successfully")
#create cursor object
cur = conn.cursor()
#view name, email, and the total amount of transaction of the 5 top customer
cur.execute('''
SELECT first_name, last_name, email, SUM(amount) AS spend_amount
FROM customer AS c
INNER JOIN payment AS p
ON c.customer_id = p.customer_id
GROUP BY first_name,last_name,email
ORDER BY spend_amount DESC
LIMIT 5;''')
#print the result of the query
rows = cur.fetchall()
for row in rows:
print("date = ", row[0])
print("sale = ", row[1],"\n")
print("Operation done successfully")
#close connection
conn.close()
For the full code, SQL queries, and the database files feel free to check out my github repositories : https://github.com/hpraditya/sql-dvdrental.git . I am still learning on this topic, but if you have any question or want to have a discussion, feel free to contact me through my email praditya.humam@gmail.com or message me through my Linkedin.
References :
2017, Dale Anderson, Data Model Design and Best Practices – Part 1, https://www.talend.com/blog/2017/05/05/data-model-design-best-practices-part-1/
2021, Robert Agar, Stages and Types of Data Models, https://tdan.com/stages-and-types-of-data-models/28201
Data Analyst
4yKeren Mas Dito