Success in Shifting Analytics to the Cloud

Sanjay Bhasin

Published Oct 21, 2016

With numerous new options available for data analytics, more companies are looking at the need to upgrade outdated and restrictive platforms—both hardware and software. They see the opportunity to create state-of-the-art capabilities using increasingly viable options in cloud technology, Big Data analytics, machine learning, and visualization tools.

For example, a company recently encountered a common problem: the existing analytics platform, SAS in this case, was hitting the upper limit of its capacity that resulted in limiting their ability to serve current customers faster (agility) or to quickly and easily add new data suppliers (responsiveness). Even worse, the existing platform offered a limited range of capabilities. Built to accommodate only structured data, it gave no option to analyze unstructured data from a multitude of sources including click stream, social media, email, and documents.

A Cloud-Based Analytics Platform

Moving data analytics capabilities from an on-premise legacy platform to a cloud-based architecture opens up considerable capabilities as one can immediately take advantage of a highly extensible infrastructure to address the lack of agility while saving money by use of on-demand instances rather than having to install and maintain an on-site system. The new platform brought in more capabilities, first speeding up the ability to flexibly ingest a multitude of data formats and improving access to sophisticated predictive analytics.

The architecture of a Cloud-Based Analytics Platform involves five layers that need to be carefully designed: Infrastructure, Persistence, Integration, Analytics, and Visualization.

The Infrastructure in this instance was Amazon Web Services (AWS) though other options are available such as Azure from Microsoft and Google Cloud.

The Persistence layer includes different services to manage repositories such as Hadoop, MySQL, PostgreSQL, etc. It ensures interaction with various storage systems such as Amazon S3. This case study utilizes Amazon Elastic MapReduce (EMR), a service that uses open-source Hadoop software to process data from both structured and unstructured sources, and Apache Hive, an open-source “data warehouse” that allows users to query and manage large data sets in distributed storage. This allowed the use of cheaper AWS S3 to store and archive data and moving to HDFS during active processing only.

The Integration layer manages data acquisition, transformation, quality, persistence, consumption and governance. One such technique is known as Metadata-Driven Transformation[i] (MDT), which enables users to process data using characteristics gleaned from the data itself, such as type, source and other facets. This data ingestion technique is easily configurable and, as a result, is much faster to set up and maintain than traditional techniques (handle modifications necessitated by evolving data characteristics through changes to a configuration via extensive coding).

The Analytics layer is responsible for modeling, algorithms and managing analytics pipelines. This layer combines analytics available to the business by combining Big Data with an enhanced Recommendation[ii] algorithm on Spark that allowed the use of an expanded set of attributes—including demographics and historic engagement—to improve the accuracy of the results.

The Visualization layer handles user interaction (querying and presentation) with the results of Analytics. Tableau was integrated with a Hive data store to accelerate ad-hoc analysis and the creation of dashboards for distribution to a broader audience. Note that once can also use alternatives such as Impala for faster access at increased cost.

Foundation for the Future

Overall, the solution melded technological capabilities with business needs, creating a foundational Data Analytics platform that supported the growth in business. There were several benefits realized: (a) overall processing time decreased by two-thirds, (b) serve current needs at lower cost due to support staff being reduced in half, (c) ease of visualization of processed data, (d) enable more sophisticated analysis, (e) quicker on-boarding of new data suppliers and accommodate varied data formats from new channels via MDT, (f) comfortably handle increase in data volume, and (g) improved prediction accuracy via extended algorithm.

More importantly, if the solution architecture is designed well then the foundational Analytics platform can be easily extended to solve issues in other business areas spanning from Research, Manufacturing / Maintenance to Agile Marketing.

Sanjay Bhasin is a seasoned Analytics professional who has created solutions for Pharmaceutical, Healthcare, Consumer Goods and Financial Services clients leveraging Predictive Analytics, Machine Learning and Business Intelligence technologies. We welcome your questions or feedback on this topic.

[i] In contrast to classical ETL (extract transform and load) processes that require the specification of exact mapping for all data conditions at Design time, MDT is flexible as it selects a transform at Run time based on triggers such as data value and source.

[ii] Recommendation engines are commonly used by companies such as Netflix for providing suggestions of items based on consumer preferences.

To view or add a comment, sign in

Success in Shifting Analytics to the Cloud

Sanjay Bhasin

A Cloud-Based Analytics Platform

Foundation for the Future

More articles by Sanjay Bhasin

Others also viewed

From Data Warehousing Woes to Cloud-Powered Insights: Why Google Cloud's BigQuery Reigns Supreme

Cloud Data Warehouses vs. Data Lakes: Which Data Solution is Best for Your Business?

Architecting Data Lake Solutions with Azure Data Lake Storage

Polyglot Persistence : A preferred path to big data in the cloud for Enterprises

Discovering Azure OneLake: A Game-Changer for Data Management

Exploring Azure Synapse Analytics: Dedicated Pools vs. Serverless Pools

Cloud Migration for Data Analytics: Navigating the Unseen Challenges

Data Quality on AWS

Building a Modern Data Warehouse in Azure

Azure Data Lake - An enabler in the growing big data world

Big Data Integration Platforms

Cloud-Based Analytics Integration

AWS Data Transformation for Cloud-Based Solutions

Big Data Analytics Implementation Issues

Explore content categories

A Cloud-Based Analytics Platform

Foundation for the Future

More articles by Sanjay Bhasin

Neural Networks: A Perspective

Four Steps to Get Started With Predictive Analytics

Others also viewed

From Data Warehousing Woes to Cloud-Powered Insights: Why Google Cloud's BigQuery Reigns Supreme

Cloud Data Warehouses vs. Data Lakes: Which Data Solution is Best for Your Business?

Architecting Data Lake Solutions with Azure Data Lake Storage

Polyglot Persistence : A preferred path to big data in the cloud for Enterprises

Discovering Azure OneLake: A Game-Changer for Data Management

Exploring Azure Synapse Analytics: Dedicated Pools vs. Serverless Pools

Cloud Migration for Data Analytics: Navigating the Unseen Challenges

Data Quality on AWS

Building a Modern Data Warehouse in Azure

Azure Data Lake - An enabler in the growing big data world

Similar topics

Big Data Integration Platforms

Cloud-Based Analytics Integration

AWS Data Transformation for Cloud-Based Solutions

Big Data Analytics Implementation Issues

Explore content categories