The new cloud data ecosystems
How many times have you heard and read that data is the new oil or AI is the new electricity? I assume that, at least, enough times to perceive it as an empty mantra that everybody repeats.
Don't get me wrong, I truly believe that today every company is a data company. The challenge however, it's always the same: How can I extract insights from the past to learn and predict the future in way that I can make decisions, which allow me to obtain business value? In a nutshell, how I become a data-driven company.
The challenge can be faced from many different angles, technology vs cultural wise, type of projects that need to happen first like migrating existing data platforms to the cloud to later on exploit, building new data platforms from scratch in the cloud, starting with AI projects to motivate breaking more data silos...functions and features that need to be considered [Data/ML]Ops, Data Governance, Data Lineage, Catalog, MDM...and a very long list of providers and technologies to be considered.
However, after 2 and a half years in my current role talking with customers and partners across 12 European countries, I've seen that some patterns and elements are always present in the most successful projects. Below you can see an over-simplification of them:
- Organization: This one is probably the biggest enablers in any project, maybe even more than data and technology itself. You need to consider to what extent does the organizational strategy, culture, leadership and funding support your Data Analytics and AI projects. How much time organizational leaders are committed to kick-off the project, how much time they invest in the data governance meetings...without the proper organizational umbrella project are set for failure.
- Cloud Infrastructure: What is the underlying infrastructure, how does support all parts of the company vs individual business units, which technologies are in place and how they integrate into the existing environment, what are the data integration capabilities...I want to be purposefully controversial on this point, today I can't imagine a new data analytics project which is not in the cloud. I assume you can self-justify many reasons to do it on-premises, some will argue that is security, others data residency, data sovereignty, internal policies, latency...all lies you tell yourself to avoid/delay the decision. Data Analytics and AI innovation comes from the cloud, period.
- Data: Where is the data I need, who owns it, is it "siloed", is it centralized, is it structured or unstructured, is it enriched with 3rd party data sources. Despite being an old concept it still applies the three Vs: what is the Variety, Volume and Velocity of data used and how is managed to support analytics and AI. Does everyone have access and sees the same data truth.
- Analytics&Insights: Consider here how advanced is the organization in the use of data, what is the data used for, how it impacts company decisions, who are the roles working with the data, what skills are required to extract insights from the data, what business process are automated, do exist standard reports vs how much time people spend manually finding the data, exporting, adjusting and extracting insights.
- Governance: Is it governance done consistently across the organization, do we have a common business vocabulary, is there a data governance classification to set up access security, data privacy and data retention, what are the data governance roles and responsibilities, do we have data governance processes for the entire life cycle of the data, how do you provide an audit trail to know where data originated and has been transformed.
- Ethics: This might sound accessory to projects, but the reality is that we are the first generation in the history of humanity that endow computers to make decisions that previously have always been made only by people. It's therefore imperative that we have a clear process in place to ensure that those decisions are fair, transparent, inclusive and reliable. That won't happen out of the blue, you need to become intentional on defining your data analytics and AI ethics.
By enabling all these components and answering the non-exhaustive list of questions, it will help you to build the data foundation for a modern cloud data ecosystem in your organization.
The process is not trivial, but the benefits are massive. After you have in place the data foundations of your organization, you will be able to review each one of your business process and start redefining them based on data. This will put you in the right path from shifting away from an IT-oriented analytics to an enterprise-grade, enterprise-oriented analytics solution.
Good contribution from Steen Rasmussen, who would "add an extra column on the left called Activation, which is the reason why we should invest in analytics at all". Not sure this is the way I would include it, but he is completely right that it is always necessary to have a "why". There is no point into introducing technology, simply for the sake of technology. Technology is "just" the enabler of a purpose.
Mikael Munck