What Do I Need in a Database?

What Do I Want In A Database?

I once took a day off from my normal activities at my work to help my wife’s employer out.  They know they need “a database”, but don’t really know what they want to put in it.

This is a common place to start the adventure.  The potential users may not know precisely what they want to accomplish, but they know what tool they want to use.

It is also the wrong place to start.

Stepping Back

This tendency to start with the solution, and skip figuring out what the problem is, seems to be a powerful force in human beings.  Nearly every user I have dealt with over the last 25 years had started with “requirements” which implied the design of the solution.  Most of my IT colleagues haven’t helped either.  They also start out thinking in design.

I have learned to resist, to step back to an earlier stage.  This is harder than it sounds, because I have to exert a subtle resistance to the relentless flow toward premature design.

Why Do I Bother?

If I let the rush toward design proceed, the design that is reached may well be the wrong one.  Even at the broad level of deciding to use a DBMS, I may be making a mistake.

After all, a DBMS is not the only place in which data can be stored.  Each of the other possibilities has its own advantages, and situations in which it is the best solution.  If they didn’t, no one would have bothered to create them.

As an example, consider using a spreadsheet.  A spreadsheet can store data in simple rectangular layouts that mimic a table in a database.  That data can be sorted or filtered, much as we can in a database.  It can be graphed.  It can support calculations more easily and powerfully than in a database.  Also, I don’t have to be rigid about the table-like structure.  If I need a few loose individual values, I can just stick them in.  I don’t have to define the columns I use before I use them.

Sometimes, using a spreadsheet is the right answer.  Sometimes, using a database is.  If I don’t step back and ask what we are trying to accomplish, I won’t know whether I am using the right tool.

Even after I have concluded that there is overwhelming evidence that a database will need to be a part of the solution, I need to keep focusing on what we want to do, before we move on to how we will do it.

When is a Database Right?

Certain characteristics of the problems to be solved suggest that a database tool is the right thing to use.

The need to sort or filter the data certainly suggests that a database might be a solution, but there are other tools which can do both of those things (including spreadsheets, as noted above).

One clue is that the information to be entered has to have high integrity.  If it is important to check that the values in a certain field or column fall in a particular domain, a database will have significant advantages.

Another clue is that there is a large amount of information.  Databases are able to store large quantities of information, and support special data structures that make relatively rapid access to large amounts of data possible.

Yet another clue is that a large number of people (or systems) will need to access the same data, at more or less the same time.  Database management systems have special features for arbitrating between users who are trying to get to the same data at once.

How Do I Figure Out What Is Required?

The core of analyzing requirements for a database is the same as for analyzing requirements for any other system.  In another post, we will look at some object oriented methods for doing this (using UML).  In another note we will look at a popular database oriented technique.  However, the fundamentals aren’t about any particular technique or methodology.  While you have probably considered this in software development or analysis, I would like to cover some of the basics as I understand them, again.

First, you need to gather the requirements from people who have a stake in how it will work (or, at worst, people who understand those people).  We refer to people who have a stake in how the system will work as stakeholders.  This usually includes the users of the system, but there are often other stakeholders as well.  These can include regulators, support personnel, customers on whose behalf the system is being used, and even innocent bystanders who are affected by the system.  For example, a database that tracks the credit of individuals may affect those individuals even if they are unaware of its existence.

Thus, a good starting place is to try to understand who the stakeholders are (at least by type).  Every kind of stakeholder has needs about the way the database will operate, which may be different from the needs of any other kind of stakeholder.  If we fail to recognize the existence of a class of stakeholders, we are almost guaranteed to fail to understand their needs.

From there, we have to ask about what they need the system to do.  This is a point where the drive to design will take over.  The stakeholders, or their representatives, will almost surely state their needs in terms of how they want the system to work, not what they want it to do.  That is, they will be implicitly designing the system for you.

This is where you need to remember that you are the one who knows how to design systems, not the stakeholders.  You can handle this one of two ways.  First, you can gently redirect the stakeholders to explore what they are trying to achieve by having it work that way.  This will allow you to record not what they asked for (likely a design), but the requirement behind it.

Alternatively, you can record what they said as a need, but recognize that it is not a requirement.  This allows you to keep track of what they have asked for, but later go back to explore what they were trying to achieve with it.

Either way, you must not simply record the design expressed by the stakeholder as if it were a requirement.  There are dragons that way.

A good requirement is unambiguous, testable, and describes what to do, not how to do it.  We’ve already discussed the third of these properties.

A requirement is unambiguous if there is only one way to interpret it.  Requirements that are ambiguous can lead to implementation of the wrong interpretation of the requirement, and strife between developers and users over whether the system works correctly.

A requirement is testable if there is some way to demonstrate that the system meets it.  Requirements that can’t be tested can also lead to strife, where the developers claim that the requirement has been met and users claim it hasn’t.

One interesting technique in requirements development is to create the tests in parallel with the requirements.  That allows you to be sure that your requirements can be tested, and makes it clearer how you understood the requirements.

What Kinds of Requirements Do I Need?

In planning for a database, we need to determine what information needs to be tracked in the database, as well as what methods of collecting and regurgitating that data there will need to be.

While these are separate goals of the process, they are linked to one another.  If the database is to produce a piece of data in its output, it must have that data, or something it can be derived from.  If the database is to have that data, there must be some form of input that provides it (or provides something that it can be derived from).

There may also be requirements that are not about the function of the system.  Instead, these requirements are about qualities of the way the functions are provided.  In the world of databases, some of these are very important.

Performance requirements include throughput, response time and capacity requirements.  They can be hard to state well.  A response time requirement defines how long it takes to perform a particular operation.  For it to be unambiguous, you need to specify:

  • The points in the process between which time will be measured
  • The size of the affected parts of the database when the measurements are done
  • The load on the system during the measurements.

User requirements are obviously focused on response times from the user interface perspective.  Thus, the points of measurement are often implicitly something like “from the time I click OK until the response is completely displayed”.

Why do we care about the size of the database at the time?  After all, the users don’t.  They want this speed no matter how large the database becomes.  Unfortunately, essentially all methods for performing database operations have performance that gets worse as the size of the database increases (sometimes only a little, and sometimes rapidly).  We need to know the size we have to deal with for the requirement to be meaningful (and testable).

As a side note, one of the big mistakes in database testing is to test against a database that is too small.  When you do this, you can miss serious performance problems that will be obvious in production.

Load is equally critical.  While they may not realize it, users are always thinking about performance as it will exist in the production system.  That will actually vary over time, depending on how much is being demanded of the system at the time.  In order to meaningfully test performance, we need to know what the expected load is like (how many of what types of queries or updates).  Often, we think of two cases: average load and peak load.  We need to make clear which we are talking about, and what the load will be at that time.  This is all guesswork, of course.

Throughput is a related measurement of how many operations can be performed in a given period of time.  It is related to, but different from response time.  Indeed there are some design choices that can make response time better at the cost of worse throughput!

Throughput requirements are also only meaningful in the context of a certain database size and other load.

Capacity requirements define how much information of certain types needs to be stored.  This can be expressed in terms of the number of records, but may also involve the average or peak size of some columns (especially ones that can be very large).

Reliability, availability and maintainability requirements also are important for a database.  These are linked measurements.

Reliability requirements state how often the database (or specific services it provides) may cease to operate.  Maintainability requirements say how long it can be down when it does go down.  Availability requirements say what fraction of the time it must be available.

These are usually expressed in terms of averages.  Reliability is typically expressed in the Mean Time Between Failures (MTBF).  Maintainability is typically expressed in the Mean Time To Repair (MTTR).  Availability is typically expressed as a percentage.

Systems with very tight requirements in these areas can require much more complex techniques to be able to meet them.  Designing to address these can require looking at particular failure modes (disk failures, CPU failures, network outages, software errors, shutdowns to install new software or take backups, etc.).  Use of redundant hardware (e.g., RAID drives, or even snapshots on different machines) may be needed.  For extreme cases, you may need replicated databases that are kept in sync by special software (and automatically restored after one copy goes down).  I’ve actually done that (it is not easy).

Usability requirements are also often important.  These can define things like:

  • What kinds of skills the users will have
  • How quickly the users must be trained
  • What styles of interaction the users are used to

Adaptability requirements define what kinds of changes need to be expected for the system.  This can be critical for database design.  Performance may be best for a database design that is “hard-coded”, using separate columns for each different piece of information (e.g., a column for the section name, one for the department name, and one for the division name).  However, if the business realities being modeled change quickly, you may be better off with a more adaptable structure (e.g., with a separate table listing the types of organizations, and logic based on that table).  This kind of “data-driven” design can be trickier to start out with, and a bit slower, but may be much easier to change.  Adaptability requirements would define areas where the structures are likely to change often.

Security requirements may be thought of as functional requirements, or handled separately.  They can include:

  • Requirements on access by people or systems according to the roles they play.
  • Requirements on what records should be kept for audit purposes.
  • Requirements on what granularity of access control is needed.

Conclusion

It is very important to know what you need, before becoming entangled in how to provide it.

Now that brings back some memories.

Like
Reply

To view or add a comment, sign in

More articles by Gary Page

  • Designing for Maintainability

    Designing for Maintainability More time and effort is spent on maintaining software than on developing it in the first…

  • The Mighty Warrior Models Classes

    The Mighty Warrior Models Classes The mighty warrior reached into her pack, and pulled out her favorite weapon, the…

  • What Data Should I Store: Database Design

    What Data Should I Store: Database Design Once the question of what database management system should be used is…

  • Getting It Done Quick and Cheap

    Getting It Done Quick and Cheap What do you do when your boss says, “That is just too long!” or “We can’t afford…

  • My Neighbor Bob and The SDLC

    My Neighbor Bob and the SDLC One evening on the way into the house, I talked to my neighbor, Bob. In the course of our…

  • Good Requirements

    Good Requirements In order to build a system or business process to do what your company needs, someone needs to write…

  • The Mighty Warrior Gathers Information

    The Mighty Warrior Gathers Information Her long raven tresses cascading down the back of her silver armor, the mighty…

    1 Comment
  • The Information Factory

    The Information Factory Several years ago, I was working for a client who needed help planning an internal network. My…

  • Draining the Swamp

    Draining the Swamp When we try to simplify business processes, and make things more efficient, we are often surprised…

    1 Comment

Others also viewed

Explore content categories