White Paper on Custom Change Data Capture Solution for SQL Server Databases

1.                Introduction

1.1            Background

Operational systems in Enterprise Applications normally have relational databases as their persistent data store. Business operations on these applications lead to creation, update and deletion of data from tables within relational databases. MI (Management information) and BI (Business Intelligence) is generated as part of nightly batches in downstream Applications. Downstream applications look at change log to find out incremental changes happened in the source operational systems. This incremental data is helpful in processing very specific changes to business data thereby avoiding full processing of entire data within operational databases.

New age Applications provide publish-subscribe based service for publishing selective changes to relevant business data. For others, specific Database Replication tools such as SQL Server Transaction Replication, Attunity Replicate and others are used to provide incremental change events.

However on some occasions for legacy applications database replication tools cannot be used without needing substantial changes to Legacy applications.

Downstream systems are then forced to look at entire content from Operational database to process generate incremental meaningful information as part of overnight batch jobs. This approach increases overall batch execution time and increases strain on infrastructure due to additional load processed on a daily basis.

1.2            Problem Statement

A Custom Change Data Capture solution is needed to generate change log pointing to specific change events generated by operational system. This solution should ensure that no changes are needed either in operational system or in downstream system. Downstream system should only see specific changes that happened since the last time batch was processed.

1.3            Purpose

The purpose of this White Paper is to provide solution overview of Custom Change Data Capture (Custom CDC) Application for identifying daily incremental changes on specific SQL Databases.

2.                Solution Overview

Change data can be generated by comparing two consecutive versions of operational database. All required tables can be compared to find out changes between two consecutive versions. Consecutive version can be of two consecutive days or two consecutive hours or any other time gap. Smaller the gap, more the number of change versions.

Consider, Source Database is n

Consider, Consecutive Database is n+1

Following are the possibilities of comparison

-         All rows present in n but not in n+1 have been deleted

-         All rows present in n+1 but not in n have been inserted

-         All rows present in n and n+1, but have some difference in any required column have been updated.

Following diagram gives a solution overview of a real world implementation of Custom Change Data Capture solution for SQL Server Database. Same design approach can be used for any other database application.

Figure 1: Custom CDC Overview

2.1            Source DB

Source Operational System provides business data to downstream systems. Operational System has a relational database referred to as Source DB in above diagram.

2.2            Full Data Replication

Full data replication can be attained using custom SQL scripts or using tools such as SQL Server Snapshot Replication, Attunity etc.

2.3            SQL Server Database N

SQL Server Database (N) forms prior copy of source database business data. This Database will be used by downstream applications. This database will be overwritten with new copy of source database as soon as change data capture application completes its processing.

2.4            SQL Server Database N+1

SQL Server database (N+1) gets the latest copy of Source DB. Once Custom CDC Application has identified the incremental changes between latest copy and the prior copy of Copy Databases, then Custom CDC Application will copy N+1 Database to N Database.

2.5            SSIS ETL

SSIS ETL will be used to compare N and N+1 copies of Databases and will generate Change Tables. This ETL will also initiate a full copy of data from Database N+1 to N at the end of the process. This removes any coupling with downstream Applications.

2.6            Change Database

Change Database contains Change Tables for all pre-configured tables. Change Tables generated by Custom CDC Application match the format of Change Tables generated by standard data replication tools. This ensures that incremental changes are recorded and made available to consuming applications independent of source systems.

2.7            Tivoli Work Scheduler

Tivoli Work Scheduler is used for scheduling execution of Custom CDC Application for generating change logs in Change Database. Any other enterprise scheduler can be used.

2.8            Audit Table

Custom CDC Application generates audit output while performing change data capture between N and (N+1) versions of Databases. This information can be used for monitoring of Custom CDC Application batches in real time. Also performance of Custom CDC Application can be gauged using audit entries within Audit tables.

2.9            Configuration Table

Configuration Table provides configurability of Tables from Source DB that need to be considered for generating change data capture records. Specific columns that are required for change capture can also be configured within the Configuration Table to filter out any unwanted changes in Source DB.

3.                Detailed Design

Custom CDC Application uses CONFIG tables to pick a set of tables and their columns for comparison between the prior copy (N) and the latest copy (N+1) of Source Database. This is done to pick selective changes in the Change Database and reduce efforts required by downstream Application to filter out unwanted changes.

SQL Server Table Diff utility is used for comparing tables across N and N+1 database. Table Diff returns a listing of all rows which are present ONLY in N, ONLY in N+1 and those which are different between N and N+1.

Any row that is ONLY present in latest Copy Database is captured as an INSERT. Any row that is ONLY present in previous Copy Database is captured as DELTE. Any row that differs between latest and previous Copy Databases is captured as an UPDATE.

Once all tables have been compared and INSERTS, UPDATES and DELETES have been captured, Custom CDC Application commits all changes to Change Database.

Following flow diagram gives a rough view of the flow of activities that are conducted as part of execution of Custom CDC Application.

Figure 2: Custom CDC Detailed Design

4.                Error Handling & Monitoring

Custom CDC Solution will abort its processing and will not commit any changes to Change Database if any error is encountered either during comparison of latest and prior copy of Source DB (operational database) or during recording of changes in the respective Change databases.

Status changes of Custom CDC Application will be recorded in Batch Audit database to give a real time view of CDC Application. Log entries are generated while Custom CDC Application compares each configured table as part of database comparison.

To view or add a comment, sign in

More articles by Anand Singh

Others also viewed

Explore content categories