Why does it take me so long to get data at work?

Why does it take me so long to get data at work?

Think of the following situation. You have a meeting tomorrow morning with your management team in which you would like to tell them that you think the company should invest in a sales team within London, a market you currently don’t have a local presence in. Your hypothesis is that you have seen a rise in sales in that area and you want to capitalise on the momentum. You would like to put together a report to back you up, but realise that to get this report would require you to go on a long list of similar request to which you would get your static report back. Of course, you skip this process and enter your meeting with no data. You are not a data driven business. In fact, even with the report, you are a data aided business, but without forecasting, simulation and other factors, you are very far away from being able to back your hypothesis with anything tangible. Like most enterprise companies today, you are in the same state as the story mentions above. Why is this happening? Because what would typically be solved with the Data Warehouse, is not addressing the requirements of the modern business. The Data Warehouse is still a critical piece of the overall story, but it is not purpose built to solve certain demands of todays enterprise companies.

When we ask our enterprise customers what the forefront of business needs, it is echoed loud and clear that data needs to be un-siloed, data needs to be flexible, high quality, much more ready to use than it is today. This requires new ways of thinking, new technology and a new drive from the C level to initiate this program. The forefront of business needs to be able to explore what data is available as a start. They need to know what is available for them to “play” with - otherwise it is just guess work. This starts with some type of public catalog, but not a catalog in the way that the enterprise is use to. We are talking about a catalog which is much more about searching for business objects, instead of tables, columns and un blended data sources. This is too raw, it is too technical, it is not ready in any meaning of the word.

You have probably seen videos and keynote sessions where data solutions type in the word “Customers” and then they say “look how easy it is to find my customers across my entire stack”, don’t be fooled, the work has only just begun. The same business wants to type-in “Customers” and they are brought back a list of their customers that has already been unified and merged from all the systems, has already been cleaned, normalised, enriched and governed, then they need to be able to direct this data to a platform that is purpose fit for gaining value out of this data - either in the shape of visualisation or business intelligence.

This type of experience does not exist within the enterprise today, but yet, it is exactly what is needed to meet the demand on the common and modern data use cases. 

It is about friction.

The good news is that the solution is conceptually easy to understand, yet technically challenging to achieve. The conceptual solution lies in pushing the boundaries of data access much later in the stage that it is being done today. Data Access by the business is typically done at the raw level (export from the source) or better yet, at the Data Lake or Data Warehouse level. This is way too early in the stage if it is the source or the Lake and way too late in the stage if it is the Data Warehouse. The trade off is that if it is the Data Lake, it is achievable, but don’t expect that you will get any tangible results within 6 to 12 months. We need to think about introducing technology within this pipeline that pushes the data access to the stage where the data has already been cleaned, has already been governed, haas already been passed through legal checks, has already managed consent, retention periods and data policies. It needs to have already been cleaned, normalised, tracked, enriched. Yes, this is a lot of things, and this is exactly what your individual business units are doing themselves now, for every single use case - and the worst part is that the collective work done never propagates back to a universal master data management tool or even better yet, the source tool.

The solution is in providing a proactive framework to make all of these steps something that is handled after the Data Lake, but before the Data Warehouse. The solution is provide universal tools to help in each of these mechanisms of data preparation however to offer the flexibility that if people would like to use other tools to handle these individual pieces - that a data foundation allows this to be seamless. It is sometimes a balancing act to know how much to prepare the data i.e. not too much that it doesn’t allow flexibility, but not too little that you just shift the work to your business units. The good news is that you can have the best of both worlds via a platform or data foundation that allows you to act on any single version of the data that you want i.e. cleaned, but not normalised, or maybe normalised but not governed. A good data foundation should allow for this flexibility.

If the queue is clean, why is it still so long? We have heard from many of our customers and companies that we talk with that “it doesn’t take that long to get data to the requestors of that data”. No, in hindsight, you need to think about what is meant by “long”. In the examples these same companies gave us, it was mentioned that 6 weeks or maybe within 3 months was “fast”. You will never be a data driven business with this setup. If your business cannot quickly put together adhoc requests for data in under a day, you will never be data driven or as data driven as you can be. Why is there a queue to even begin with? It is because of the nature of the stage at which data access is done. Like most companies today, the data access involves a business analyst putting together a SQL query with multiple joins to get the data needed. Once the joins are in place, you should be able to achieve extremely fast querying and the ability to place in different filtering necessary. But what if the query needs changing or you suddenly need new data from different data sources? Well, you join onto the back of the queue and wait again. This is not Data Driven and it does not lead to being Data Driven.

Once we introduce the Data Foundation, in a few years will we just need to introduce yet another part of the pipeline? Yes, without failure or stutter, yes. This is exactly how technology works and is exactly how we should embrace it. It is also the importance of the Data Foundation that it is not replacing anything but rather complementing the already existing data pipeline and data strategy. I could definitely foresee a day in the future where something would fit in-between the Data Lake and the Data Foundation and likewise the Data Foundation and the Data Warehouse or upstream consumers. This may come in the format of another abstraction layer provided by different and new Database Technologies or potentially better support from vendors to adhere to a common standard on how to get data into a Data Lake. Here in lies the true value of the Data Foundation, it is to establish a proper stitching strategy on how to improve a data pipeline overtime and once robustness has been established, it is about optimising this pipeline for cheaper operation, or faster performance or more business value or stronger business insights.

Easy data accessibility will be the norm within the enterprise within the next 5 to 10 years and long gone will be the days where we need to bend a Data Warehouse, Data Lake or other technologies to do or provide value that it was not designed to provide. The infrastructure needed for this will be vast, at the start expensive, but as technology in general advances it will become much more approachable from more and more businesses. 



To view or add a comment, sign in

More articles by Tim Ward

Others also viewed

Explore content categories