Enhanced Threading Framework for SSIS

I'm really proud of my Enhanced Threading Framework (ETF) which is built on-top of SQL Server Integration Services (SSIS).   The original reason for building it was two-fold.  The main driver being a business problem to solve and the secondary impulse was to see how far I could push myself and the SSIS environment.  The framework takes parallel processing in SSIS to the next level.

Since then I have revisited the framework a number of times and have come out with my 3rd iteration of it.  I am pleased to say that it has passed a significant test thanks in part to my good friend Lorenzo Matthews, who contacted me after attending my PASS 2013 session: "Unseen Minions : Improving parallel processing in SSIS."  

He said that his company, IMPLAN, LLC, had a unique situation and thought my framework could help.  It was a fantastic challenge and I am proud to say that both he and IMPLAN, LLC, are happy with the results.

The nature of the test and the results are posted below:

Data: 1 years worth of data was contained in 3,575 Microsoft Access databases, with each database containing 162 tables, totaling 3.1 billion records. 

Task: Ingest 12 years of past data and then use this process to ingest incoming data.  They required a new, stable ingestion process that would complete in a timely manner and allow data to be re-ingested any time it was required.  

Current Process:  When ingesting even a single year the routine would run for 3 to 4 days and often fail to complete.

Solution: I leveraged the Enhanced Threading Framework for SSIS (ETF) that I developed to resolve Implan’s predicament.

System:  SSIS and SQL Server; running on separate boxes, but with identical specs:

  • 16 GB memory
  • 64-bit operating system
  • Intel Xenon CPU E5-2690 v2 @ 3GHz (6 processors)

Results: After implementing the ETF for Implan, a year’s worth of data (3.1 billion records) is now ingested in approximately 11 hours; an average of 79,000 rows per second.

The ingestion runs to completion without error and can be rerun whenever needed.

The 79k limit is the max their particular SQL Server configuration was able to ingest.  The SSIS machine had plenty of room to spare.  The framework itself is limited only by two factors:

  • Maximum amount of data the given SQL Server configuration can ingest
  • Maximum number of processors (threads) on the SSIS machine.

To view or add a comment, sign in

More articles by Josef Richberg

Others also viewed

Explore content categories