IT Automation, Design a Data Repository
Why an automation data repository is needed ?
Traditionally, organizations have taken an elemental or piecemeal approach to IT automation. Scripting or individual automation tool are there to solve point problems. While this elemental approach may work in the short term and provide results quickly, this approach will inevitably having disparate solutions that don’t communicate, reducing end-to-end automation workflow efficiencies.
In the “IT Automation, Build a Framework” article, we lay out an IT automation framework to take a tiered approach. Within our automation framework, various IT automation tools and other infrastructure tools generate large volume of data. In our quest for a data-driven approach, the automation data repository is the key component of our automation framework.
In a modern data management infrastructure, we want to have near-time analytics to be performed directly on data, eliminating the need to aggregate the data into a centralized location, data stores such as Hadoop, support the creation of an end-to-end analytics infrastructure. Why duplicate the IT automation data and integrate it into one place?
First, the potential data flow we are dealing with will not considered 'big data' in terms of volume. In our tiered approach to IT automation, we recognize the fact that IT automation framework could not be put in place overnight. But top down matrices reporting is needed from day one to justify ROI. A No-SQL database to absorb all the data flow first and generate potential usage later seems a much better alternative.
Second, we have to solve the potential lock-in problem. As one of the key automation architects in my current organization, I have a lot of chance to review many vendor offerings to various full IT automation suite solutions. The problem with a full automation suite offering is that it is very ‘intrusive’ since it requires data collection at every infrastructure component level. Any automation engine has to rely on data to be ‘smart’, Therefore most likely a hidden vendor database will be in the back-end anyway. Every company’s need/environment is different. IT automation data integrated and stored in a vendor app potentially have a lock-in problem that many senior IT executives are not willing to risk.
An architectural approach to IT automation requires consolidating and coordinating silos of automation within a single framework. Your own automation repository gives IT the agility they need and provides the flexibility to implement multiple automation solutions simultaneously.
Finally, we need to address API and automation tool dependency as we described in our IT automation framework article. IT automation tool market is booming. In many automation/robotics areas, there is no clear market leader. Your tool choice today may not be the best choice down the road. If a tool is replaced, we certainly want to continue holding on the previous run history. Those IT automation data is essential to maintain a full picture for data analytics and possible AI solutions.
On the automation tool dependency side, an unified single tier API created for our automation data repository eliminate the potential need to build API between different automation tools. Our automation repository as the key component of the automation framework provides the virtual tier to interact with all the automation tools, past, now and future.
As illustrated in the figure below, our automation repository sits in the automation layer and serves as the central data source for all the inputs and outputs for our automated workflows.
Design an automation repository
Data is always the core of IT automation. Data integration across structured and unstructured data is sometimes necessary. Therefore, a No-SQL document database is more suitable than a traditional relational database.
We choose Couchbase as our back-end database. As shown in the figure below, different data source will be imported into the Couchbase cluster with different data source key naming convention.
As illustrated in the figure below, different data flow will be imported in the Couchbase back-end using JSON data format.
A light weighted reporting interface reporting data from the automation data repository. The console will provide a customized reporting catering different audience. However we also tested a very simple Microsoft power BI interface if only simple run time/success ratio is needed.
With all the automation/RPA workflow run data in one place, we also tested some straight forward machine learning on a H2O AI platform. Using Gradient Boosting Machine (GBM) model, we achieved very good result on variance detection. At this time, the volume of the run data limits the usefulness to try a Deep Learning Model. I will cover our AI machine learning example in a separate article.
-END #automation
I'd love to know, Key, who introduced you to this topic?