Building a Centralized “Single version of the Truth” enterprise data lake --- hit the line hard
The data lake architecture is gaining huge momentum and enabling a paradigm shift from a traditional data warehouse based reporting to an enterprise data warehouse leveraging advanced Business Intelligence tools.
The term “data lake” referred is now mainstream for enterprise data management with Hadoop being the new normal for data storage. Due to its ability to accommodate and handle huge volumes of data, and its ability to integrate with varied data sources including everything from consolidation external reporting systems to ERP transactional systems at a lowest level of transaction detail, it lays a solid foundation for building a centralized “Single version of the Truth”. The concept of Big-data and associated technology originated initially to manage massive volumes of un-structured or social media type of data and need to curate and report on such data. However, the state of these technologies has been evolving rapidly to access, acquire, curate, analyze and report on all manner of data (structured, semi-structured and un-structured) incl. enterprise business data from a wide variety of ERP sources.
This paradigm shift force Chief Information Officers (CIO’s) and Enterprise IT strategists and architects to take a hard look at the business case of investing to consolidate many different transactional ERP systems platforms to one “giant-horse” Unified ERP platform vis-à-vis enabling centralized enterprise data reporting and analytics platform. Though I am not ruling out the rationale or business advantage that could be realized in consolidating very small niche / home grown legacy ERP’s to a major / tier 1 or 2 type ERP or to drive a strategy of region based or process based IT application consolidation, the concept of driving enterprise wide global transformation to consolidate and converge all ERP’s into one common ERP platform seems to be a distant past. With the dynamic nature of business in today’s enterprise, multi-year ERP global convergence or consolidation programs pose a bigger risk to the enterprise and hardly results in success.
In the last few months, as I work though the enterprise data strategy, architecture and design to build a data lake solution for Finance business function at Johnson controls, I realize that there are three key guiding principles that are absolutely critical to build and sustain such an enterprise data lake platform and also to realize the long term business goal and value of such an initiative.
These guiding principles may appear very practical, common sense in nature and very straight forward for such enterprise initiatives to be successful but often, you would realize that in many organizations such fundamental guiding principles are not given due importance or deemed less priority due to various reasons – just having a short term vision, other conflicting priorities or cost / schedule pressures and many more. As a result, these initiatives end up being a buzz-word for some time and soon becoming obsolete or not relevant to business stakeholders.
1. Top of House engagement – Building an Enterprise data management and analytics platform should not be perceived as just an IT development work to unify all data into one common technical platform. Executive leadership and management alignment with leaders of respective business functions to engage right people from business groups in this initiative is a critical success factor. This top management engagement will not only be useful during the build phase of such enterprise reporting solution but most importantly to sustain and leverage the business value as a part of steady state regular operations post project completion.
2. Set a broader objective – Even though the origination of such initiative may be from one business group or sub-group, organizations need to take a broader view of how centralizing enterprise data in one place, with one version of the truth can enable multiple use cases across functions and help drive enterprise value. Based on my practical experience working to build an enterprise finance data lake, I could envisage multiple use cases just within finance function – FP&A or financial reporting, Tax compliance and legal entity reporting, Internal audit & compliance tracking, profitability and margin reporting, product pricing & quotation management, centralize country finance functions like local GAAP reporting and many more.
Setting a broader business objective and pooling of multiple use cases will also help the IT delivery organization to take a centralized or holistic view of investing in right set of tools, technologies and platforms required to support the overall solution architecture. This holistic or broader view will help enable simplification, repeatability and consistency from a long term standpoint.
3. Well governed data model (data governance in general) – We all understand that when it comes to enterprise ERP application, all data and underlying business rules are actually owned by business and so it is highly important to establish a robust data lineage to make the enterprise data lake accessible in a relevant way to all business functions. This will mean that implementation of such initiatives should also drive institutionalizing a robust governance mechanism from both IT and business organizations for long term sustenance.
Business users must be enabled with an intuitive, user friendly means to assure mappings and business rules are set-up and periodically maintained correctly. So, adoption of right ETL or data preparation software tools should form a key technology component to the whole solution.
Also the true value of engaging business process owners and SME’s in such initiatives will reflect positively when the business is able to start driving process changes or improvements, process standardization and enhance data quality based on their ability to have full insight to enterprise data from a central source of truth. Also the long term engagement of business process owners will also enable Hub and Spoke model in management and consumption of modeled enterprise data across multiple use-cases or opportunities.
Lastly, it is important to realize that the path to build a Centralized “Single version of the Truth” and to make all stakeholders aligned to these critical success elements could be a bumpy and painful process but the only way we will be able to get to the goal line will be by hitting the line hard and persistently keep pushing and staying firm on your feet making sure these fundamental guiding principles are well set and understood within the organization in parallel to building a centralized "single version of truth" enterprise data lake..!!
"Single version of truth" --> I think, overall Data (a kind of rubric cube of Master data, Transaction data and Config. data) ownership is always a mystic world while defining accountability in most organizations. Few EA Data programs I had been associated with, there had been a constant tussle of <who really owns the data> - not only from the good old “IT Vs Business” accountability, but even within each of the silo business process functions – and it’s not easy answer, since SAP has not a clear-cut visibility of slicing of ownership by organization function even while defining MDG systems, which SAP claims as a featured data hub. Some claim that it’s an org. internal issue- but even the GxP kind of blue chip companies the core issue is always the same. The Data Ownership! That’s why it has been few decades, still except few workflow systems, companies still struggle to build and manage single source of truth. Imagine complex Material Master having almost 15+ views all inter-related with SC, Logistics, Sales, WH, Finance, QA, Mfg and more complex if it’s an extension of R&D or Development Database. So unless, the SC Org. is uniformly defined, Process Owners will look into the silos of dat. practically assigning a single GPO for Material master (who will take ownership of all 12+ views) it is a biggest challenge in most of the data sensitive organizations. It’s a complex issue, but most value added (if solved properly) and Data is something which directly hits the core of an organization – and still a lot of decision makers think it’s an overhead function to be managed like other service functions. Sorry for bit lengthy comment :-) but this good old area is quitemake or break for ERP consolidations, which now are being discussed big time !