Developing An AI-enabled Data Platform Strategy
In the last decade data platforms and architectures have evolved from structured RDBMS platforms, curated dashboards, reporting and BI, to big-data(broad, unstructured assets), predictive and prescriptive analytics.
However, it is poised for a much bigger change with AI, as strategies that enterprises have adopted as recent as 5 years back are turning out to be already outdated. AI has upended the idea of data and analytics in several ways, most of which is hinged on the idea that data doesn't have to be precise anymore and information can be extracted or extrapolated with enough indicators. These indicators are, what we commonly call prompts, training techniques and historical data.
The evolution and change has been exponential in the last few years. Data and analytics in organization are not used to this much change, this fast. However, to exploit the opportunities presented by AI requires a bit of sense of urgency to stay ahead of competition in top and bottom-line value.
The Changed Landscape
The shift from traditional data architectures to AI-driven platforms is a shift from passive reporting of "what happened" to near real-time "Active intelligence". We evolved from modeling techniques of schema-on-write for data warehouses to schema-on-read for lakehouses (noSQL/Hadoop). However, today's GenAI-supportive architecture goes beyond all of these and focuses more on semantic understanding and multi-modal data.
Following is a summarized version of some of the salient changes across seven dimensions, I believe that enterprises need to consider as they develop the next generation data and analytics strategies. Easier to read as a table than text.
Taking Action To Adapt to the New Model
Following a recommended approach to shift to the new model and adapt an enterprise data strategy to exploit the opportunities with AI.
The recommendations are roughly in that order. One builds on previous capabilities and prerequisites.
1. Decide the Right AI Data Strategy
One of the common challenges enterprises face today is deciding whether there should be a single platform for traditional reporting/BI/predictive analytics, and also for GenAI capabilities? In other words, can an enterprise datalake also be the source of all of the enterprise data to support GenAI use-cases? Depending on how data teams are organized in an enterprise, It is possible to build one single platform for all enterprise data. However, most organizations will likely end up with two different platforms with some sort of cross-access between them.
This decision has likely the most impact on the AI data strategy.
Recommended by LinkedIn
2. Start with Building Semantic Layer
Enterprises have been building semantic layer for some time now, even before current AI capabilities.Building a semantic layer starts with data classification in the enterprise. Cataloging data in warehouses, Datalakes, RDBMS, ODS, SaaS apps, ITSM tickets, emails, file repositories etc. to identify the data that truly is useful for AI insights.
3. Incrementally Develop Lakehouse Architecture
This can be done incrementally. Collapsing data-warehouses and datalakes, at least for some subset of data, into a Lakehouse will be a good starting point to develop use-cases with AI. As discussed earlier, the full potential of AI can be realized when all forms of data is one place. Eventually as enterprises mature, it can progressively choose to invest and grow out their Lakehouse.
4. Standardize Vector Databases With RAG Pipelines
Purpose-built vector databases with enterprise data, supporting RAG pipelines for easy querying and connectivity to industry LLM-enabled tools can quickly extract value with minimal effort. All the data enterprises carry in reports, mails, messages, contracts, etc., need an easy way to query by AI tools. Vector databases index and offer such a function. Enterprises can start with some high-value, low hanging opportunities such as product catalogs, legal contracts, financial filings and demonstrate value first, before scaling.
5. Modernize Data Pipelines
It is probably the time to revisit traditional, old-school data pipelines and progressively replace with intelligent AI Agent-driven data pipelines. They are not only resilient, but they are also richer in capabilities enabling a wider variety of data formats and structures. An agentic data pipeline can do away with the lag and staleness of data moving through systems for analytics, and replace it with a real-time analytical capability. The lines between operational and analytical reporting can quickly blur, driving value faster.
6. Establish Strong Data Governance For AI
Probably something to start in parallel with other actions. Any strategy or platform developed to exploit opportunities with AI need strong data Governance across the enterprise. A cross-functional formal data Governance council will be a prerequisite for the success and value out of investment in AI. This includes technical leaders (CIO, CDO, data architects), legal and compliance, risk, HR, and LoB representatives. The council decides on use-cases, data needs, tools used and RACI for use of data and AI output. Deloitte recommends anchoring this with a broader AI governance charter. A revisit of current data organization structure and the role of Chief AI officer should also be explored. CDO owns data infrastructure, quality, and governance; CAIO owns model strategy, AI deployment, and responsible AI policy.
Summary
With today's emerging technology, data/analytics strategy is evolving from data aggregation and human interpretation of information on dashboards, to, aggregating data so that AI can comprehend and directly present insights to the user. Enterprises are still adapting to the last change of moving from data warehouses to datalakes and it is not easy to pivot again to the new model. However, change is needed. A deliberate, progressive approach with a clear financial model built on ROI rather than open-ended R&D, can help organizations to build an AI-enabled data platform.
(This is the third piece in the series on developing enterprise technology strategy impacted by the rise of AI capabilities. Feedback welcome)
References:
Feels like we’re quietly shifting from people reading data to machines thinking with it. Edge won’t be who builds the best platform; it will be who learns to trust AI-driven decisions faster. In the end, it’s not about more data; it’s about making data understandable enough for AI to act on it.