Adding excellence to your data handling operations
As a database architect, you want to design and improve data systems. You want to create powerful tools to help move your business operations forward. You do NOT want to be the custodian of a problematic data feed. You don't want to waste time answering that same old question, "where is my data?". You do not want to be the scapegoat for delays or inaccurate results.
To this end, you should consider creating this simple feature to add manageability and introduce excellence to all your data operations. Of course, it is best to include this feature from the start, but it is rarely too late to add on later.
The feature is storing the runtime result metrics as data in your database. Store the data required to identify what data set was processed by which program at what time and if it failed why. Record the input record counts, success counts, error counts, error details, and more as needed. Having that data ready for search in a database will provide instant operational awareness. With this data, a few queries or reports will give you the mastery of the day-to-day operations. It will drastically reduce the time you spent maintaining, investigating, and debugging a system after release. There is a lot more you can do with this data. I will write about that in later articles.
You can implement this feature with a small set of database objects. It can be as simple as three tables and a half dozen stored procedures. Once these basic pieces are in place, use them in all your database operations and applications. For legacy operations, it may be possible to insert the stored procedure calls into existing logging or trace procedures. This way, you will avoid the need to update all your old code.
Next, wrapping an API around the stored procedures will raise the usability to the next level. With an API, you can collect metrics from a much broader range of your operations. Add API calls to your data collection scripts, transport jobs, and cleansing applications. You can create parsing scripts to gather the result metrics from log files and load those metrics into your database. Collecting data from a broader range of applications will extend your control. Ideally, each step of the data life cycle will store the operational results in a single database.
Why put that data into a database:
- Once the metrics are in the DB, they are hard to lose.
- You already have database skills and servers available.
- Everyone can find the data there.
- Database search and reporting tools are widely available and well known.
- The database will provide fast and sophisticated searching.
- The database can analyze multiple dimensions of performance for each application and data source.
- The database can join records together to trace a data set from one step to the next. Use this to find the location and cause of exceptions.
Why one database:
- You will have a consistent model for the set of measurements you need to analyze.
- You can use one search across many platforms.
- You will know the whole data flow and dependency life cycle.
But you already have log files:
- Where are all the many logs files for all the different applications located?
- Who can search the log files? What access permissions and skills are required?
- How long are those logs saved?
- What keywords will you use to search each of the different log files?
- Can you link the log results together to trace a data set across the applications?
- What kind of analytics can you perform on the contents of those logs?
Business reasons
- Reduce costs/time spent supporting your existing data operations.
- Reduce costs/time spent on user support.
- Reduce costs/time spent on debug efforts.
- Proactively identify bottlenecks, failure points, and SLA violations.
- Improve user satisfaction: Let your users know where their data is and why.
- Establish back end operations reporting: How many records can do you process in a day? Do you have a backlog?
- Improve data quality by verifying the arrival or failure of data.
- Find exceptions before your end-users find them.
- CYA
Start with a flexible relational data model.
You will need at least three levels of a relational table hierarchy:
- Application
- Procedure
- Error
These will support a reasonably detailed data model. You will need some flexibility regarding which level attributes are placed and the format. For example, a data set may be the name of a large archive file, or just a sub-component data structure of details. Error tracking will probably be the finest level of detail. Depending on data volume, it may be best to aggregate the error metrics by type. Perhaps store a single example of the problem data for analyses. A large text field is handy.
Some of the runtime metrics you might store are:
Application Level
- Application name
- Host
- DB name
- Data source name
- Data set name
- Start time, Finish time
- Result status
Procedure level
- Procedure name
- Action of execution line step
- Data set name
- Data target
- Start Time, Finish time
- Number of source records
- Number of successful operations
- Number of inserts
- Number of updates
- Number of Errors
Error Level - consider aggregates grouped by error type
- Error type
- Stack Trace
- Error message
- Record ID
- Problem data sample
Stored Procedures
CAUTION: Be extra careful to handle exceptions within each of these stored procedures. Without it you can end up looping on add_error or hanging your operations.
- Application_start ()
- Application_done()
- Procedure_start()
- Procedure_update()
- Procedure_done()
- Error_Add()
Summary:
This simple add on feature will supply an impressive level of management and control to your daily operations. With it, you should be able to hand off more of the support details to more junior members of the IT team. However, before you dash off to your next project, consider some of the further uses you might find of this data. How about sending error reports directly to the responsible parties? I will go into more uses for this data, including one ninja level query, in the next article.
Really helpful advice Will. Thanks for posting.