Future of Data Integration Platform - Merging Technologies

Future of Data Integration Platform - Merging Technologies

Previously (and still used in most places) there used to be 2 distinct types of data integrations tracks. We used to call them batch and real-time integrations. The batch integrations were ETL in nature for most cases. Though We used the term ETL not only for batch integrations but also for loading data into data warehouses or for data migrations. Now there has been always a classic debate of real-time vs batch in most organizations and you just can't ignore that if you are in the data integration space. And I have been fortunate to be part of many such interesting discussions and debates.


But over the past few years the technologies have evolved hugely and a lot of work has been done by the big integration vendors in the data integration space. This resulted in the contrasting technologies coming more towards each other in terms of integration capabilities. I see real-time technologies not shying from claiming to process big volumes and so do the traditional batch products not shying away from processing batches on demand (in real-time). So do they really meet ends now?


Of course not, as of today. Though the products have been re-engineered and nurtured to come towards the other end, the core capabilities of these products and the base on which they were developed can't be taken away. You can't really take your old volkswagen and turn it into a new Ferrari how much hard you try. You might be close but in the process you might realize buying a Ferrari on the first place would have been a better decision, if of course that's what you want. And if you can afford, you might want to keep both if you need both some time or other.


Same rule applies while choosing your data integration platform as well. Though I have observed mostly all medium to big size organizations need both the flavors to support various data integration needs. You have to balance between volume, transformation complexity and availability. I call it the triple constraints of data integrations. 


How I see the future of data integration - I believe most of the data integration vendors are trying really hard to minimize this gap and to get there quickly - will group products in one suite which they then can call 'The Ultimate Data Integration Suite'. E.g How difficult will it be for a big DI vendor having both capabilities to bring both the products under one suite with common metadata and uniform logging and monitoring framework? The success though will lie on how they are going to bundle the products and price it - as in this competitive market that will be the make/break factor. But to me that will be still a marketing gimmick as there will be 2 totally disparate systems which were originally developed in isolation, working underneath a common framework. 

But then there will be some vendor (I hope and I wish) that will come up with a bottom-up architecture and try to address this data integration need and will succeed. The day we get there, we are going to stop debating batch vs real-time, instead we will treat all data integrations uniformly and there will be one data integration platform of choice in any organization which should be able to perform the integration as needed. Once that happens, the debate is gone. Debate is for technology choice and internal politics of where the work will finally land. When a choice is not needed, there's nothing to debate for.


So for me, the future is exciting and DI consultants like me need to stay up to speed on these developments. And to add to this, comes the cloud platforms (replacing the traditional in-premise hardware) and NoSQL distributed DB endpoints demanding specific knowledge on the big data space. I am running, and you might want to as well.



Image courtesy - Actuate (http://birt.actuate.com/)


To view or add a comment, sign in

More articles by Dwaipayan (Deep) Nath

Others also viewed

Explore content categories