"Topic: Data Abstraction-3 levels, Entity Relationship (ER) Model and Database Normalization"

"Topic: Data Abstraction-3 levels, Entity Relationship (ER) Model and Database Normalization"

Data Abstraction - 3 levels of Architecture in DBMS or Logical DBMS Architecture or 3 levels of Data Abstraction

Firstly,

What is Data Abstraction?

Data Abstraction = Data abstraction refers to the process of simplifying complex data structures or systems by focusing on the essential aspects and hiding unnecessary details. It involves representing data and its operations at a higher level of abstraction, making it easier to understand and work with.

The main purpose of this is," to Hide unnecessary details and provide an abstract view of the data for the end user".

So, now we have to know,

How many levels of abstraction are there to design the DBMS architecture effectively?

In the context of database management systems (DBMS), data abstraction is achieved through the use of different levels of abstraction.

In DBMS, data abstraction is performed in layers which means, there are three levels of abstraction/architecture in database management systems (DBMS). These provide a hierarchical structure that allows for effective design, management, and interaction with databases.

1) #Internal_Level (also known as Physical Level):

  • The physical or internal layer is the lowest level of data abstraction in the database management system. It is the layer that defines how data is actually stored in the database. It defines methods to access the data in the database. It defines complex data structures in detail, so it is very complex to understand, which is why it is kept hidden from the end user.
  • Data Administrators (DBA) decide how to arrange data and where to store data. The Data Administrator (DBA) is the person whose role is to manage the data in the database at the physical or internal level. There is a data center that securely stores the raw data in detail on hard drives at this level.

2) #Conceptual_Level (Logical Level):

  • The logical or conceptual level is the intermediate or next level of data abstraction. This level represents the overall logical structure and organization of the entire database. It explains what data is going to be stored in the database and what is the relationship between entities, their attributes, and the constraints that govern them.
  • The logical level or conceptual level is less complex than the physical level. With the help of the logical level, Data Administrators (DBA) abstract data from raw data present at the physical level. Changes made at the conceptual level impact the overall database structure and require careful consideration and planning.

3) #External_Level (View Level):

  • View or External Level is the highest level of data abstraction that focuses on the specific needs and requirements of individual users or groups of users. It represents the portion of the database that is relevant to a particular user's perspective. At this level, users define their own customized views of the data by specifying the desired queries. External level allows for data independence and provides a personalized view of the database for different users.

Example: Let's say we have a database table called "Employees" with the following columns: EmployeeID, FirstName, LastName, Email, and Salary.

At the physical level of abstraction, the data is stored in the database's underlying storage system, such as hard disks or solid-state drives. It involves how the data is physically organized and accessed, such as using file systems and disk blocks.

At the logical level of abstraction, we define the structure and relationships of the data. In our example, we create a logical schema for the "Employees" table, specifying the columns and their data types, primary key, and any constraints.

At the view level of abstraction, we define views that provide a customized and simplified representation of the data for specific users or applications. For example, we can create a view called "EmployeeDetails" that only includes the EmployeeID, FirstName, and LastName columns from the "Employees" table, hiding the Email and Salary columns. This view can be used by a reporting application to display employee details without exposing sensitive salary information.

Advantages of Data Abstraction:

  • Simplifies complexity: Data abstraction hides the unnecessary details and complexities of the underlying data structure, making it easier to understand and work with.
  • Enhances security: Data abstraction provides a layer of security by limiting direct access to sensitive data and allowing controlled access through well-defined interfaces.
  • Supports data independence: Data abstraction allows changes to be made to the underlying data structure without affecting the applications and programs that rely on it, providing flexibility and adaptability.
  • Facilitates modular development: Data abstraction enables modular development by dividing the system into separate components, allowing developers to work on different parts independently and promoting code reusability.

Disadvantages of Data Abstraction:

  • Potential performance overhead: Implementing data abstraction layers can introduce additional processing and overhead, potentially impacting system performance.
  • Increased complexity in design: Data abstraction requires careful design and planning to ensure the right level of abstraction and maintain consistency, which can add complexity to the system.
  • Potential data inconsistency: In some cases, data abstraction layers may introduce the possibility of data inconsistency if not properly managed and synchronized.
  • Learning curve and training requirements: Working with data abstraction may require additional training and understanding of the abstraction concepts, which can be a learning curve for developers and users.

Conclusion:

These three levels of abstraction provide a clear separation between the concerns of different users and enable efficient database management. They allow users to interact with the database at their respective levels of understanding and requirements, while ensuring data consistency, integrity, and security across the entire system.

Entity-relationship (ER) modeling:

In the context of entity-relationship (ER) modeling,

  • An entity is a real-world object or concept that has a distinct identity and can be identified uniquely.
  • An entity can be anything, such as a person, place, thing, event, or concept that we want to store data about in a database.
  • For example, in a database for a company, the entities could be employees, customers, orders, products, and so on.

ER Model:

  • A data model that describes a database design with the help of the Entity-Relationship diagram.
  • Purpose: It gives High level picture of Database.

ER Diagram:

  • An ER diagram is a graphical representation of entities (such as customers, orders, and products) and the relationships between them and distribution of data across a database.
  • It helps us design a database schema that accurately reflects the relationships between different data elements.
  • ER diagrams use various symbols to represent components of diagram such as entities, relationships, and attributes.

ER diagram symbols:

we use different notations to represent different elements of an ER diagram.

  • A Rectangle: To represent an entity.
  • A set of double rectangles: For the representation of a weak entity.
  • An oval: To represent an attribute.
  • A dashed oval: To represent derived attributes.
  • A set of double ovals: For the representations of multivalued attributes.
  • A quadrilateral or diamond shape: To represent a relationship.

Components of ER model - In Detail:

The 3 main components of the ER model include:

1) #Entity: An entity is a real-world object or concept that exists independently and can be uniquely identified. For example, in a school database, students, teachers, and courses are entities.

2) #Attribute: An attribute is a property or characteristic of an entity. An attribute is represented by ellipses in an ER diagram. For example, a student entity may have attributes such as name, roll number, age, address and gender.

Different types of attributes:

There are 4 types of attributes are there:

A) Key attribute:

  • It is an attribute used to uniquely identifies a row from an entity set.
  • The text of key attribute is always underlined with an oval shape.
  • Ex: student roll no can be used to uniquely identify a student from a group of students.

B) Composite attribute:

  • An attribute that is composed of other attributes is known as a composite attribute.
  • This attribute can be broken down into two or more parts.
  • It is represented with an oval shape and that attribute is further connected with other ovals.
  • Ex: Name is a composite attribute as it is represented with an oval and is also connected with other attributes such as first name, middle name, and last name.

C) Multivalued attribute:

  • An attribute that can possess more than one value is called a multivalued attribute.
  • The double oval is used to represent a multivalued attribute.
  • Ex: Phone number is represented as a multivalued attribute as it can contain more than one phone number for a single person and also Language known, one person can know more than one languages.

D) Derived attribute:

  • It is an attribute that does not exist in the table itself but is derived from other attributes or tables.
  • It can be calculated or determined based on the values of other attributes in the table or other tables in the database.
  • It can be represented using an oval with a dashed line, instead of a solid line, connecting it to the entity. The oval is labeled with the derived attribute's name, while the dashed line indicates that it is derived and not stored as a separate attribute in the database.
  • Ex: The total marks of a student obtained in all the courses represented as derived attribute, which is not explicitly stored in the database but can be calculated using the marks obtained in each course.

3) #Relationship: A relationship defines the association between two or more entities. For example, a student entity may have a relationship with a course entity to represent the fact that the student is enrolled in that course.

Entity set:

  • Entity is any object, class or component of data can be considered as an entity.
  • It can be either be a living or non-living component.
  • An entity is showcased as a rectangle in an ER diagram.

For example: A students studies a course, here both student and course are entities.

  • A group of entity is known as an entity set.

Relationship set:

  • In a relational database, a relationship is a connection between two or more entities.
  • It shows how entities are related to each other.
  • It can be represented by a diamond shape in an ER diagram.

For example: If each student can enroll in multiple courses, but each course can only be taken by one student at a time, then the relationship set between the "student" entity and the "course" entity can be called the "enroll in" relationship set.

  • A group of similar types of relationships is known as a relationship set.

Relationship Degree:

Relationship degree refers to the number of entities involved in a relationship. There are three types of relationship degrees: unary, binary, and ternary.

1) #Unary_RelationshipDegree:

  • In a unary relationship, there is only one entity involved in the relationship. This type of relationship is also known as a recursive relationship because an entity is related to itself.

Example: A company has employees, and each employee reports to a supervisor. The supervisor of an employee is also an employee of the company. In this case, the relationship degree is unary because the employee is related to another employee (the supervisor).

2) #Binary_RelationshipDegree:

  • In a binary relationship, two entities are involved in the relationship. This is the most common type of relationship in a database.

Example: A customer purchases a product from a store. In this case, the two entities involved are the customer and the product.

3) #Ternary_RelationshipDegree:

  • In a ternary relationship, three entities are involved in the relationship. This type of relationship is not as common as the binary relationship.

Example: A school has teachers, students, and classes. A teacher teaches a class, and a class has students enrolled in it. In this case, the three entities involved are the teacher, the class, and the student.

Types of Relationships - Mapping cardinalities

  • Mapping cardinalities defines the relationship between two entities in terms of the number of occurrences of one entity that are related to the number of occurrences of the other entity.
  • Each relationship has a minimum and a maximum cardinality.

1) #One_to_One_Relationship ((1:1) Mapping Cardinality):

This relationship is formed when a single element of an entity is associated with a single element of another entity.

For example, in a database for a school, a student has only one student ID, and a student ID is assigned to only one student. This is a one-to-one relationship.

2) #One_to_Many_Relationship ((1: N) Mapping Cardinality)):

This relationship is formed when a single element of an entity is associated with more than one element of another entity.

For example, in a database for a school, a teacher teaches many courses, but a course can only be taught by one teacher. This is a one-to-many relationship.

3) #Many_to_One_Relationship ((N:1) Mapping Cardinality):

This relationship is formed when more than one element of an entity is related to a single element of another entity.

For example, in a school database, many students can be associated with a single teacher. In this case, the "student" entity has a many-to-one relationship with the "teacher" entity, as many students can be related to a single teacher.

4) #Many_to_Many_Relationship ((N:M) Mapping Cardinality):

This relationship is formed when more than one element of an entity is associated with more than one element of another entity.

For example, a student course registration system where a student can register for multiple courses and a course can have multiple students enrolled in it. This represents a many-to-many relationship.

Database Normalization:

Database normalization is a process that is used to manage and organize a database into tables and columns to reduce data redundancy, unnecessary anomalies and improve data integrity. The normalization process involves breaking down large tables into smaller ones and establishing relationships between them to ensure that each piece of data is stored only once.

The main benefits of database normalization include:

  1. #Reduced_data_redundancy: By eliminating duplicate data, database normalization reduces the amount of storage space required and improves data consistency.
  2. #Improved_data_integrity: Normalization ensures that data is stored in a consistent and organized manner, which makes it easier to maintain and update.
  3. #Greater_flexibility: By breaking down large tables into smaller ones, normalization allows for greater flexibility in how data is queried and accessed.
  4. #Better_performance: Normalized databases are generally faster and more efficient than non-normalized databases, as they require less storage space and can be queried more easily.

However, there are also some potential downsides to database normalization, including increased complexity and slower query performance in some cases. It is important to balance the benefits and drawbacks of normalization when designing a database to ensure that it meets the specific needs of the application or organization.

Anomalies and types of anomalies:

In the context of database design, an anomaly is an unexpected or unintended issue that can occur in a database due to inefficient or incorrect normalization. Anomalies can cause inconsistencies in the data, lead to data redundancy, and make it difficult to perform updates, deletions, or insertions.

There are four types of anomalies that can occur in a database:

  1. #Data_redundancy: This occurs when two or more rows or columns have the same or repeated value, causing the memory to be used inefficiently. For example, in the employee table, the records of two employees are repetitive which results in data redundancy.
  2. #Insertion_anomaly: This occurs when we cannot insert a new record into a table because it contains a missing data element that is required by the table’s structure. For example, if we have a customer table with no orders yet, we cannot insert a customer into the table until they have made an order.
  3. #Deletion_anomaly: This occurs when deleting data in a table accidentally removes additional data that is needed by other tables in the database. For example, if we have an order table and we delete a customer, all orders associated with that customer will also be deleted.
  4. #Update_anomaly: This occurs when data is updated in a table, but the change is not made in all of the appropriate tables. As a result, the data becomes inconsistent. For example, if we have a customer table and we update the address of a customer, but not in the order table where their address is also stored.

Normalization is the process of organizing data in a database to reduce or eliminate data redundancy and prevent anomalies. The goal of normalization is to create a database that is efficient, easy to maintain, and avoids data inconsistencies.

To view or add a comment, sign in

More articles by Divya 🌻

  • "Topic: Performance Tuning and Optimization"

    Performance Tuning and Optimization Performance tuning and optimization are essential processes aimed at improving the…

  • "Topic: Data Visualization"

    Data Visualization Data visualization is the graphical representation of data and information, using charts, graphs…

  • "Topic: Data Analysis"

    Data Analysis Data analysis is the process of examining, cleaning, transforming, and interpreting data to discover…

  • "Topic: Data Processing"

    Data Processing Data processing is the systematic and automated transformation of raw data into meaningful and valuable…

  • "Topic: Data Ingestion"

    Data Ingestion Data ingestion is the process of collecting, importing, and loading data from various sources into a…

  • "Topic: Apache Spark"

    Apache Spark Apache Spark is a fast and general-purpose distributed computing system designed for processing…

  • "Topic: YARN (Yet Another Resource Negotiator)"

    YARN (Yet Another Resource Negotiator) YARN (Yet Another Resource Negotiator) is the resource management layer in the…

  • "Topic: HBase"

    HBase HBase is a distributed, scalable, and high-performance NoSQL database built on top of the Hadoop ecosystem. It…

Others also viewed

Explore content categories