What Data Should I Store: Database Design

What Data Should I Store: Database Design

Once the question of what database management system should be used is answered, the next issue is what data you want to keep in it.

There are a number of techniques for making this decision.  The most popular approach is Entity-Relationship modeling, using Entity-Relationship Diagrams (ERDs).

Entity is a fancy word for “thing”.  Software people love using such words (see “object”, “widget”, and “gadget”, all of which have special meanings in software).  An entity represents a particular kind of thing or person, often in the real world.  Entities might include: car, employee, building, office, shipment, product, etc.  Notice that these are all nouns.  That isn’t accidental, as we shall see.

Relationships describe ways in which two kinds of things may be related to one another.  For example, a Car “Is Driven By” a Person (or equivalently, a Person “Drives” a Car).  Relationships are naturally described by verb phrases.  Entities play one or more “roles” in a relationship, e.g., Person plays the role of Driver.

One approach to creating a concept of the database design is by reading through a description of the activities to be automated, looking for nouns and verb phrases that connect them.  For example, consider:

The customer selects a product to purchase.  Several products may be selected as part of one order.  Usually, all of the products on one order are placed on a single shipment, but sometimes, they have to be spread across more than one shipment, if the products are not all available at the same warehouse.

Even in this short paragraph, we can identify the following entities:

  • Customer
  • Product
  • Order
  • Shipment
  • Warehouse

There are also the following relationships defined:

  • (Customer) Purchases (Product)
  • (Customer) Requests (an Order)
  • (Product) Is Part of (an Order)
  • (Product) Is Placed on (a Shipment)
  • (Product) Is Available At (a Warehouse)

Relationships may relate one object to one other, one object to many others, or even many objects to many objects.  For example, a particular product may show up on many orders, and a single order may have many products.  On the other hand, while a customer may make many orders, each order is for only one customer.

This methodology dates back to the work of Peter Chen and Wesley Chu, and was first published in the very first issue of the Association for Computing Machinery’s Special Interest Group on Management of Data’s SIGMOD Record.

There are many variations on the diagrams.  In some the relationships always involve only two roles, and are represented by lines with special connectors at the end.  In some, the relationships are represented by diamonds, and the roles by lines, allowing for relationships with more than two roles.

This figure illustrates not only the original (now old-fashioned) form of E-R diagrams, but also an important fact about relationships.  The same two entities can have many different relationships between them.  Many designers leave their relationships unnamed, or give them simplistic names (like Person-Car).  Doing that makes the relationship intended ambiguous.

That ambiguity can lead to no end of problems.  In a previous job, I designed a tool that let you create databases directly in terms of entities and relationships (implemented on top of Oracle).  As part of the training for using this tool, users were trained in E-R modeling (as supported by the tool).  The main trainer came to me during lunch with a problem from the class: they could not agree whether or not there could be more than one user per terminal in the User-Terminal relationship.

After chiding him for letting his students use a name like that (which did not represent a verb phrase), I pointed out that the problem was likely because they were thinking about different relationships between the user and the terminal.  The group that said there could be more than one was probably imagining a relationship like “Is Allowed to Use”, while the other group might have been imagining “Is Using (Right Now)”.  These relationships assert different facts about the entities.  Letting that ambiguity make it all the way into production risks many problems from inconsistent understandings.

Once you have determined which entities and relationships there are, you can identify the facts you want to know about each.  For a car, you might want to know its vehicle identification number, color, year of manufacture, etc.  For a person, it might be the first and last name, telephone number, date of birth, social security number, gender, email address, etc.  These become your initial list of attributes for that entity and relationship.

Some of these attributes identify which particular thing of that type you are talking about.  These are the attributes which will form the primary key (or at least be a candidate to be a primary key).

Representing Entities/Relationships in Relational Form

Deriving an initial database design from an ERD is pretty simple.  For each entity, you start with one table.  That table has one column for each of the attributes that you have defined for that entity.  The identifying attributes become what is called the Primary Key for that table.

Relationships can be handled in two ways.  If the relationship is one to one or one to many, and has no attributes that belong to the relationship itself, simply add the identifying attributes of the entity in the role which has one entity for many of the other role into the table representing the entity in the other role.

In other cases, you build a separate table for the relationship itself.  This table has columns for the identifying attributes of all the entities in all of the roles, to form the primary key.  Any attributes of the relationship itself are then added.

In future notes, we will see that this initial design can often be improved on.

To view or add a comment, sign in

More articles by Gary Page

  • Designing for Maintainability

    Designing for Maintainability More time and effort is spent on maintaining software than on developing it in the first…

  • The Mighty Warrior Models Classes

    The Mighty Warrior Models Classes The mighty warrior reached into her pack, and pulled out her favorite weapon, the…

  • What Do I Need in a Database?

    What Do I Want In A Database? I once took a day off from my normal activities at my work to help my wife’s employer…

    1 Comment
  • Getting It Done Quick and Cheap

    Getting It Done Quick and Cheap What do you do when your boss says, “That is just too long!” or “We can’t afford…

  • My Neighbor Bob and The SDLC

    My Neighbor Bob and the SDLC One evening on the way into the house, I talked to my neighbor, Bob. In the course of our…

  • Good Requirements

    Good Requirements In order to build a system or business process to do what your company needs, someone needs to write…

  • The Mighty Warrior Gathers Information

    The Mighty Warrior Gathers Information Her long raven tresses cascading down the back of her silver armor, the mighty…

    1 Comment
  • The Information Factory

    The Information Factory Several years ago, I was working for a client who needed help planning an internal network. My…

  • Draining the Swamp

    Draining the Swamp When we try to simplify business processes, and make things more efficient, we are often surprised…

    1 Comment

Explore content categories