Incrementally loading your BI system using SQL Server Change Data Capture
As mentioned in my previous post using SQL Server Change Data Capture (CDC) is a powerful way to incremental process data into your Business Intelligence system.
CDC not only allows you to process new and updated records but you can also elegantly handle deleted records, something that is typically difficult to do using a manually built load control process.
Any changes done to a record are stored in change tracking tables in the SQL Server database that is CDC enabled for up to 3 days. The amount of time that changes are stored can easily be increased or decreased depending on your needs. This change tracking history acts as a buffer giving you a window of time to recover from any ETL failures.
It is always good practice to monitor the number of changes occurring on the tables in your CDC enabled database. This allows you to keep tabs on your replication process which is quite often a black box process especially when using third party tools.
The following T-SQL code snippet allows you to see how many changes have occurred over a specified time frame on a single table or on all tables in your CDC enabled database.
--DROP PROCEDURE dbo.procGetNumberOfTableChanges
CREATE PROCEDURE dbo.procGetNumberOfTableChanges(@startdatetime datetime, @enddatetime datetime, @tablename VARCHAR(MAX)='')
/* How to use this procedure!!!!!
1. For a specific timeframe and table
EXEC dbo.procGetNumberOfTableChanges @startdatetime='2017-08-14 9:30:00.000', @enddatetime='2017-08-14 15:59:59.999', @tablename='[schema].[table]'
2. For a specific timeframe and all tables
EXEC dbo.procGetNumberOfTableChanges @startdatetime='2017-08-14 9:30:00.000', @enddatetime='2017-08-14 15:59:59.999', @tablename=''
3. For past 24 hrs and all tables
EXEC dbo.procGetNumberOfTableChanges @startdatetime=NULL, @enddatetime=NULL, @tablename=''
*/
AS
BEGIN
DECLARE
@from_lsn varchar(500),
@to_lsn varchar(500),
@SQLString Nvarchar(MAX),
@recCnt int,
@ParmDefinition Nvarchar(500);
IF @startdatetime IS NULL
SET @startdatetime = GETDATE() -1;
IF @enddatetime IS NULL
SET @enddatetime = GETDATE();
SET @from_lsn = Convert(varchar(500), sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @startdatetime), 1);
SET @to_lsn = Convert(varchar(500), sys.fn_cdc_map_time_to_lsn('largest less than or equal', @enddatetime),1);
IF @tablename = ''
BEGIN
DECLARE
@Cur as CURSOR,
@Schema AS VARCHAR(MAX),
@Tbl as VARCHAR(MAX);
DECLARE @MyTable AS TABLE(SchemaName VARCHAR(MAX) NULL, TableName VARCHAR(MAX) NULL, RecCnt int NULL);
SET @Cur = CURSOR FOR SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES;
OPEN @Cur;
FETCH NEXT FROM @Cur INTO @Schema, @Tbl;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @SQLString = N'SELECT @recCntOUT = COUNT(*) FROM cdc.fn_cdc_get_net_changes_';
SET @Tbl = REPLACE(@Tbl, ' ', '_');
SET @SQLString = Concat(@SQLString, Concat(@Schema,'_', @Tbl), '(', @from_lsn, ',', @to_lsn, ',''all'')');
SET @ParmDefinition = N'@recCntOUT int OUTPUT';
BEGIN TRY
EXECUTE sp_executesql @SQLString, @ParmDefinition, @recCntOUT=@recCnt OUTPUT;
INSERT INTO @MyTable (SchemaName, TableName, RecCnt)
SELECT @Schema, @Tbl, @recCnt;
END TRY
BEGIN CATCH
DECLARE @ErrorMessage NVARCHAR(4000);
DECLARE @ErrorSeverity INT;
DECLARE @ErrorState INT;
SELECT
@ErrorMessage = ERROR_MESSAGE(),
@ErrorSeverity = ERROR_SEVERITY(),
@ErrorState = ERROR_STATE();
IF ERROR_NUMBER() <> 208
RAISERROR (@ErrorMessage, -- Message text. @ErrorSeverity, -- Severity. @ErrorState -- State. );
END CATCH
FETCH NEXT FROM @Cur INTO @Schema, @Tbl;
END;
SELECT * FROM @MyTable;
END
ELSE
BEGIN
DECLARE
@SingleTbl VARCHAR(MAX);
SET @SingleTbl = REPLACE(REPLACE(REPLACE(@tablename,'].[','_'),'[',''),']','')
SET @SingleTbl = REPLACE(@SingleTbl, ' ', '_');
SET @SQLString = N'SELECT @recCntOUT = COUNT(*) FROM cdc.fn_cdc_get_net_changes_';
SET @SQLString = Concat(@SQLString, @SingleTbl, '(', @from_lsn, ',', @to_lsn, ',''all'')');
SET @ParmDefinition = N'@recCntOUT int OUTPUT';
EXECUTE sp_executesql @SQLString, @ParmDefinition, @recCntOUT=@recCnt OUTPUT;
SELECT @tablename as TableName, @recCnt as RecCnt;
END
END;
The results of this database procedure could be incorporated into an SSRS report so that users could easily monitor the replication process and see how many changes are occurring at any given point in time.
Stay tuned for a future post in which I compare CDC to Temporal Tables a new feature for tracking changes in SQL Server 2016 and later.
Regards
Anthony
PS In case you are wondering yes that is my daughter in the background not Tarzan :)