"Outsource" your data warehouse

Tushar Sachdev

Published Oct 13, 2015

My beliefs got firmed up after going to Hadoop World 2015 in NY 2 weeks ago. There are some fundamental trends that will enable my prediction that in a few years people will outsource their data-warehouse, if not all, then good parts of it. Here are the trends influencing it

The existing investments in data warehousing technologies make it impossible to re-engineer it piece by piece. What I mean is companies have too much invested into traditional warehouse technologies. Moreover retrofitting their processes to the new world is non-trivial. It’s slow and painful. Moreover it is a portfolio problem – people don’t know of the 1500 reports getting generated, how many of these are actually being used and at what frequency. On the other hand, the demand for newer reports, analytics does not seem to be dying.
Hadoop is mainstream now. It is no longer an experiment or a thing just meant for data scientist to run statistical models, but something that is a challenger to the current data-warehouse setup (yes, and I know I am using data-warehouse loosely and there are more trendier definitions of data lake, data warehouse, data marts, and so on). The performance for structured and unstructured data beats the traditional models and the scalability is phenomenal.
Companies focused on managing, integrating third-party sources are strongly established or are emerging. Take D&B or Achilles that deal with Supplier data.
Enrichment with a lot of external data is an absolute reality and a very strong need in the business community.
End-users are becoming smarter with a lot of visualization technology. New specialized roles in the business analyst space including statisticians and data scientists are now a focus for most organizations in terms of decision making.
Cloud infrastructure is more secured than my existing infrastructure. Like it or not, this is going to happen, except where very specific regulatory needs may warrant much more. But even there, cloud players are offering point solutions.
Schema-on-read and nested columnar designs are the new ways of thinking data modeling.

So how does this all relate to my prediction. Well, think of your procurement department trying to look at vendor trends combined with transaction history with these vendors for the past 5 years. If I am asking my in-house data team to deliver these new reports, it will take them months to get to the external data sources, aggregations to be build, battles with the infrastructure teams to give them more servers and so on. By the time they are almost done, the business team will ask for some different source and aggregations around the data. This may change the timeline (depending on how smartly the design was done) and more importantly how it “fitted” into my existing data structures.

Contrast this with a company that runs and maintains vendor masters for a living. They provide this through the cloud with real-time integration to other sources. They are based on the new Hadoop architecture that provides scaling, schema-on-read, nested columnar designs, have pre-built agreements with various third-party data-sources and can suck in your data and provide outputs which you need.

I think it is a very compelling story. I would totally agree with someone if you said “we are not there yet”. But with what is happening on the business side of the house, and the technology side of the house, we are not far away from the time where we will outsource our data warehouse. 3-5 years, maybe more?

Amal Kiran 2y

Tushar, 👍

Nimesh Parikh 10y

I agree with you completely but most of these are common Application life cycle problems, question then remain is how much one wants to outsource?

Arun Raj M R 10y

good compelling argument Tushar sir :)

See more comments

To view or add a comment, sign in

"Outsource" your data warehouse

Tushar Sachdev

More articles by Tushar Sachdev

Others also viewed

Distributed Storage Cluster and Hadoop - The Big Data World.

Data Lakes & 'Quality'

Navigating Big Data: An Introduction to On-Premises and Cloud-Based Solutions

Why is Amazon Athena Great for Your Big Data Analytics Needs, But Mostly Why it Really Isn't.

Everything You Need To Know About Big Data in 2020

Big trouble with Bigdata

Big Data Analytics Platforms

Big Data Analytics

Beyond Data Warehouse – The Data Lake

Data Integration Today: Beyond the Lakehouse Hype

Explore content categories

More articles by Tushar Sachdev

Personalizing your content delivery service

Banks for data – are we there yet?