Enterprise Data and Analytics Integration
Many manufacturing companies are investing in large data and analytics infrastructures. The deployment leads to very different architectures and the question often is: How to design large scale data\analytics solution? Or, what is best practice?
Since all the components such as servers, interfaces and buffer systems are modular, you’ll find very different architectures, which mostly fall into these three categories:
1. Centralized; all data are pushed to a central location
2. Decentralized; data remain local
3. Any combination of the above
There are pros and cons to each configuration, which can be summarized as follows:
Centralized: In this scenario all data a pushed into a central server and all client applications run on this system. Cloud systems would also fall into this category. The main advantages are:
· Low Cost and Maintenance
· Accessibility: Data and System Access
· Data can be merged from different sub systems
· Scalability
On the flip side you will find that:
· Systems are less reliable; every component you add will reduce the overall reliability
· Latency; data must be pushed from the source to the destination system
· Data Sequence; buffering can lead to artifacts in the data flow
· Data quality; enterprise level data are SQL’ized or resampled
The opposite is true for decentralized solutions, where data are processed on the local or plant level. Most of these systems are located on the automation layer, which is difficult to access and demands higher security settings. But since you are closer to the source, you will find minimal latency, higher system reliability and excellent data quality.
The challenge is often that there is not a clear separation of project types and therefor the specific data needs, which leads to one data architecture that needs to serve all projects. In a common scenario, you would separate your projects into at least two categories:
Enterprise Level Analytics:
Abstract data such as data aggregates
Slow moving data: time frame minutes, hours, days
Application: Optimize business processes & operations
Process Analytics:
Specific or raw data, e.g. sensor or QC data
Fast moving data: time frame millisecond, seconds or minutes
Application: Process Tracking and Optimization
Depending on the data requirements, the data architecture would then lead to a blend of a decentralized and centralized solution.
A good way to structure projects is to collect metrics such as:
How many source systems need to be merged?
What type of data and model will be used?
Data density in data points\minute or hour
What is the required uptime\system reliability? How many 9’s can you tolerate?
How often does the model need to run\uddate?
Model execution and update time are important parameters
and others …
These metrics will help to decide where to place models\applications and what data streams are required.