The MBI Framework: A Case Study in Metadata-Driven Data Warehouse Automation (circa 2008)
1. Abstract: This paper details the MBI (Meta Base Integration) framework, an internally-developed, metadata-driven system implemented around 2008 to address challenges in enterprise data warehousing. Faced with the need for faster development cycles, improved consistency, and enhanced maintainability, the MBI framework provided a comprehensive solution for automating the generation and management of a multi-layered data warehouse architecture. Key features include a PL/SQL-based generation engine utilizing standardized templates managed through a layered API (Definitions, Control, Execution, Utilities), a structured task execution framework, and the automated creation of both Data Vault (CDV, CDU) and presentation (PKB) layers. This system represents a significant, early example of applying Data Warehouse Automation (DWA) principles to achieve "conformed, controlled, condensed code" across the data warehouse lifecycle.
2. Introduction
The mid-to-late 2000s presented significant challenges for enterprise data warehousing. Traditional development approaches often resulted in lengthy project timelines, high development and maintenance costs, difficulties in enforcing standards, and challenges ensuring consistency across complex environments. Extract, Transform, Load (ETL) processes were frequently complex, resource-intensive, and difficult to adapt to evolving business requirements. Concepts like Data Warehouse Automation (DWA) and metadata-driven development were emerging as potential solutions, promising accelerated delivery, improved quality, and greater agility through the use of reusable patterns and automated code generation.
Against this backdrop, the MBI (Meta Base Integration) framework was developed internally around 2008 as a strategic solution to these challenges. The fundamental premise of the MBI API was "to enable people to make available code in a conformed and controlled manor," aiming to reduce duplication, increase conformance to standards, and enhance the ability to improve code over time. It provided a comprehensive, metadata-driven platform built primarily using Oracle PL/SQL to automate the creation, deployment, and management of data warehouse components across a multi-layered architecture.
This paper presents a detailed architectural overview of the MBI system, examining its core framework components, its implementation of a layered architecture including Data Vault and presentation layers, and analyzing the features that positioned it as an advanced system for its time. We will explore the design principles, the metadata-driven generation engine, the task execution framework, and the resulting data warehouse structure based on technical documentation and code artifacts from the system.
3. MBI System Architecture
The MBI system implements a multi-layered data warehousing architecture designed to systematically ingest, integrate, conform, and present data for analytical and export purposes. The overall process flow, depicted in, reveals distinct stages connected by standardized "Generic loader" processes, indicative of the MBI framework's automated data movement capabilities.
3.1 Staging Layer (Owner_oxt, OWK) At the base of the architecture lies the Staging layer. While specific details of Owner_oxt and OWK are limited in the provided documents, their position in the flow diagram suggests they handle the initial ingestion of data from various source systems (indicated as potentially including SAP, SBL, SMS, SYL, VIS, DTL, EAN based on CDV/CDU view definitions). This layer likely performs minimal transformations, focusing on capturing raw data before it enters the main integration processes. Components like the Sig loader might represent specific ingestion mechanisms.
3.2 Conformed Data Vault Layer (Owner_cdv) Data moves from the Staging layer into the Conformed Data Vault (CDV), implemented within the Owner_cdv schema. This layer serves as the primary integration point, designed using Data Vault modeling principles. Its key characteristics, evident from the schema definition, include:
3.3 Conformed Data Universe Layer (Owner_cdu) Following the CDV, data is further processed into the Conformed Data Universe (CDU) layer, represented by the Owner_cdu schema. This layer also adheres to Data Vault principles but likely involves applying more business rules and conforming data across different source representations present in the CDV. The architecture diagram labels this as the domain of "CONFORMED PROCESSES" and describes the CDU as holding a "Conformed set of entities populated from the CDV and internal sources". Like the CDV, it comprises:
3.4 Conformed Processes Layer (AKB, SKB) Moving upstream from the CDU, the process flow diagram depicts layers labeled AKB and SKB under the heading "CONFORMED PROCESSES". While specific schema definitions for these layers were not provided in the reviewed materials, their position suggests they serve as intermediate, potentially subject-oriented data marts or further refined analytical layers. They consume data from Owner_cdu via generic loaders, implying they structure the conformed Data Vault information into formats more readily consumable for specific business intelligence or reporting needs before data reaches the final presentation layer.
3.5 Presentation / Export Layer (Owner_pkb) The Owner_pkb schema represents the primary presentation layer, as shown in the "EXPORT PROCESSES" section of the flow diagram. Based on the schema definition and its position in the architecture, this layer transforms data sourced from the upstream layers (specifically shown flowing from SKB) into structures optimized for end-user querying, reporting, and data export. Key aspects include:
3.6 Export and Cache Finally, the process flow diagram shows data moving from Owner_pkb through an "Export process" and potentially into a "Data cache". This represents the final delivery mechanism, where data structured in PKB is extracted for downstream applications, external partners, or perhaps loaded into a cache for performance optimization of specific reporting tools. The pkb_dl_sylivan_bic package provides an example of such an export process, writing data to files.
3.7 Architectural Summary In summary, the MBI system architecture represents a structured, multi-stage data flow: from initial staging (OXT/OWK), through integrated Data Vault layers (CDV, CDU) providing a historical and auditable core, into potentially subject-oriented conformed layers (AKB/SKB), and finally culminating in a presentation layer (PKB) optimized for consumption and export. The consistent use of generic loaders between layers highlights the automated, framework-driven nature of the data integration process managed by the MBI API.
4. The MBI API Framework (OWNER_MBI)
The core engine driving the automation and standardization within the architecture is the MBI API framework, implemented primarily within the OWNER_MBI schema. This framework embodies the principles of "Do it once, Do it in one place, Conform to standards, Make it available, Make it improvable, Enable control". Its fundamental purpose is to provide a controlled and consistent method for generating and executing code, particularly DDL and potentially other script types, based on predefined, parameterized templates and metadata. The API is structured into distinct functional layers, ensuring separation of concerns and promoting reusability.
4.1 Definitions Layer (MBI_PKG_API_DEFN_XXX) This layer serves as the repository for code templates or "frameworks". Instead of storing templates in tables, the definitions are implemented as functions within dedicated PL/SQL packages (e.g., MBI_PKG_API_DEFN_DDL, MBI_PKG_API_DEFN_DD, MBI_PKG_API_DEFN_MTV). Each function (e.g., MBI_FTN_API_DEFN_INDEX, MBI_FTN_API_DEFN_PK_CONSTRAINT) returns a CLOB containing the parameterized code string. Parameters within the templates are denoted using a consistent syntax, such as <<PARAMETER_NAME>>. Creating new definitions involves taking a standard script, identifying variable elements, and replacing them with these placeholders.
4.2 Control Layer (MBI_PKG_API_CONTROL) The Control layer acts as the engine for processing the definition templates. Its primary function, encapsulated within MBI_FTN_API_CONTROL, is to take a requested definition name (p_defn) and a collection of parameter names and values (name_val_tab of type mbi_types.param_tt), retrieve the corresponding template, and substitute the placeholders with the provided values. This involves sub-functions:
4.3 Execution Layer (MBI_PKG_API_EXECUTE, MBI_API) This layer is responsible for invoking the code generation process and executing the resulting scripts.
4. The MBI API Framework (OWNER_MBI) (Continued)
4.4 Utilities Layer (MBI_PKG_API_UTILS_XXX) While the Definition, Control, and Execution layers provide the core generation mechanism, the Utilities layer offers a crucial level of abstraction and encapsulates higher-level logic and standards. Implemented as procedures within specific utility packages (e.g., MBI_PKG_API_UTILS_DDL, MBI_PKG_API_UTILS_MTV, MBI_PKG_API_UTILS_DD), these utilities serve several key purposes:
Recommended by LinkedIn
4.5 Overall API Flow The typical flow when using the MBI API involves a calling process (e.g., a deployment script, a scheduled task) invoking either a specific Utility procedure (MBI_PKG_API_UTILS_XXX.ProcedureName) or directly calling the Execution layer's request procedure (MBI_PKG_API_EXECUTE.MBI_PRC_API_REQUEST) with a definition name and necessary parameters.
This layered, metadata-driven approach provides a powerful, flexible, and controlled mechanism for automating code generation and management within the data warehouse environment.
5. Implementation Aspects
The MBI framework and the associated data warehouse architecture were implemented primarily using Oracle database technologies, with PL/SQL forming the backbone of the generation engine, task execution, and loading processes. Several key implementation choices characterize the system:
5.1 Technology Stack: The core logic resides within Oracle PL/SQL packages stored within the OWNER_MBI schema. This includes the API framework (Definitions, Control, Execution, Utilities), the task control and logging mechanisms, and the standardized data loaders. The data warehouse layers (CDV, CDU, PKB) are implemented as standard Oracle schemas containing tables, views, sequences, and other objects generated or managed by the framework.
5.2 Metadata Utilization: A cornerstone of the MBI framework is its extensive use of metadata. Rather than relying solely on parameters passed during execution, the Utility packages (MBI_PKG_API_UTILS_XXX) actively query the Oracle data dictionary to derive necessary information and enforce standards. Examples include:
5.3 Task Execution Framework: The MBI system includes a robust task execution framework built around several repository tables and control packages within OWNER_MBI:
5.4 Standardized Data Loading (mbi_loader): The framework appears to include a dedicated package, mbi_loader, designed to handle standardized data loading patterns between layers, likely powering the "Generic loader" components shown in the architecture diagram. This package dynamically constructs SQL statements (INSERT, MERGE, DELETE) based on task parameters defining the source (SRC_), target (TGT_), load method (LOAD_METHOD), and specific column handling rules (e.g., AUDIT_COLS, DATE_COLS, MATCH_COLS, DEL_COLUMNS). It leverages helper functions to determine column lists and build appropriate clauses, ensuring consistency in how data is moved and integrated.
5.5 Extensibility: The framework was designed with extensibility in mind. Adding support for generating new types of objects primarily involves creating a new parameterized template function in the appropriate DEFN package and, typically, a corresponding procedure in a UTILS package to handle parameter derivation and orchestration logic, following the examples provided in the documentation.
6. Advanced Features (Contextualized for 2008)
Evaluating the MBI framework and its associated data warehouse architecture within the context of data warehousing practices prevalent around 2008 reveals several features that can be considered advanced for their time. While concepts like Data Warehouse Automation (DWA), metadata-driven ETL, and alternative modeling techniques like Data Vault existed, their comprehensive and integrated implementation within a custom-built framework like MBI appears sophisticated.
6.1 Comprehensive Metadata-Driven Generation: The MBI framework's reliance on metadata was not merely for parameterization but extended to the automated generation of database objects and process logic across multiple architectural layers.
6.2 Structured PL/SQL Generation Framework: Building a comprehensive DWA framework entirely within Oracle PL/SQL was a distinct implementation choice. While commercial ETL tools and emerging open-source options existed, the MBI API's layered design (Definitions, Control, Execution, Utilities) provided a highly structured, internally controlled environment for code generation. This offered deep customization possibilities and potentially tight integration with the Oracle database, although it required specific PL/SQL expertise. The structured nature contrasts with potentially more ad-hoc scripting approaches sometimes used for automation.
6.3 Early and Automated Data Vault Adoption: In 2008, Data Vault was still gaining traction compared to traditional Kimball and Inmon methodologies. Choosing Data Vault as the core modeling paradigm for the integration layers (CDV and CDU) was itself a forward-looking decision, aiming to leverage its strengths in flexibility, scalability, and auditability. Critically, the MBI framework did not just adopt Data Vault conceptually but provided mechanisms to automate the creation and loading of its core structures (Hubs, Links, Satellites) based on metadata and templates, addressing one of the key challenges often associated with Data Vault implementation – the volume of objects to be created and managed.
6.4 Integrated Task Execution and Control: The framework included a built-in task repository and execution engine (mbi_taak_control, mbi_taak_logging, associated tables). This provided capabilities for scheduling (implied via task definitions), parameterization, dependency management (through batch definitions), status tracking, and logging within the same environment used for code generation. This level of integration offered better control and visibility compared to potentially fragmented approaches relying on separate scripting, scheduling, and logging tools.
While individual concepts like metadata usage or Data Vault existed in 2008, the MBI system's integration of these ideas into a comprehensive, internally developed PL/SQL framework for automating generation across multiple architectural layers, including the presentation layer, represents a notably advanced and ambitious implementation for its time.
7. Conclusion
The MBI framework, developed circa 2008, stands as a testament to the potential of applying metadata-driven principles and Data Warehouse Automation techniques using an internally developed, cohesive platform. This case study, reconstructed from technical documentation and code artifacts, reveals a sophisticated system designed to address prevalent data warehousing challenges of its time.
The system successfully implemented a multi-layered architecture, progressing data from staging through integrated Data Vault layers (Owner_cdv, Owner_cdu) and into a flexible presentation layer (Owner_pkb) suitable for diverse analytical needs. This structure, particularly the early and automated adoption of Data Vault modeling, provided a scalable and auditable core.
The engine driving this architecture, the MBI API framework (Owner_mbi), demonstrated significant strengths:
Considered within the context of 2008, the MBI system's integrated approach, its scope of automation (especially including presentation layer generation), and its implementation of Data Vault position it as an advanced, forward-looking solution. While commercial DWA and ETL tools were evolving concurrently, the MBI framework represents a powerful example of leveraging internal expertise and a metadata-centric philosophy within an Oracle PL/SQL environment to achieve significant automation and control.
Potential considerations or trade-offs for such an approach include the reliance on specialized PL/SQL skills for maintenance and extension, and the need to continuously evolve the internal framework in parallel with commercial tool advancements. However, the MBI system serves as a valuable case study illustrating how ambitious DWA goals could be realized early on through a well-structured, metadata-driven, custom-built framework.
8. References
Must be one of the first fully automated Edw,s to incorporate datavault and includes many features adopted years later in dv 2.0.