"Outsource" your data warehouse
My beliefs got firmed up after going to Hadoop World 2015 in NY 2 weeks ago. There are some fundamental trends that will enable my prediction that in a few years people will outsource their data-warehouse, if not all, then good parts of it. Here are the trends influencing it
- The existing investments in data warehousing technologies make it impossible to re-engineer it piece by piece. What I mean is companies have too much invested into traditional warehouse technologies. Moreover retrofitting their processes to the new world is non-trivial. It’s slow and painful. Moreover it is a portfolio problem – people don’t know of the 1500 reports getting generated, how many of these are actually being used and at what frequency. On the other hand, the demand for newer reports, analytics does not seem to be dying.
- Hadoop is mainstream now. It is no longer an experiment or a thing just meant for data scientist to run statistical models, but something that is a challenger to the current data-warehouse setup (yes, and I know I am using data-warehouse loosely and there are more trendier definitions of data lake, data warehouse, data marts, and so on). The performance for structured and unstructured data beats the traditional models and the scalability is phenomenal.
- Companies focused on managing, integrating third-party sources are strongly established or are emerging. Take D&B or Achilles that deal with Supplier data.
- Enrichment with a lot of external data is an absolute reality and a very strong need in the business community.
- End-users are becoming smarter with a lot of visualization technology. New specialized roles in the business analyst space including statisticians and data scientists are now a focus for most organizations in terms of decision making.
- Cloud infrastructure is more secured than my existing infrastructure. Like it or not, this is going to happen, except where very specific regulatory needs may warrant much more. But even there, cloud players are offering point solutions.
- Schema-on-read and nested columnar designs are the new ways of thinking data modeling.
So how does this all relate to my prediction. Well, think of your procurement department trying to look at vendor trends combined with transaction history with these vendors for the past 5 years. If I am asking my in-house data team to deliver these new reports, it will take them months to get to the external data sources, aggregations to be build, battles with the infrastructure teams to give them more servers and so on. By the time they are almost done, the business team will ask for some different source and aggregations around the data. This may change the timeline (depending on how smartly the design was done) and more importantly how it “fitted” into my existing data structures.
Contrast this with a company that runs and maintains vendor masters for a living. They provide this through the cloud with real-time integration to other sources. They are based on the new Hadoop architecture that provides scaling, schema-on-read, nested columnar designs, have pre-built agreements with various third-party data-sources and can suck in your data and provide outputs which you need.
I think it is a very compelling story. I would totally agree with someone if you said “we are not there yet”. But with what is happening on the business side of the house, and the technology side of the house, we are not far away from the time where we will outsource our data warehouse. 3-5 years, maybe more?
Tushar, 👍
I agree with you completely but most of these are common Application life cycle problems, question then remain is how much one wants to outsource?
good compelling argument Tushar sir :)