Architecture for Data Exploration using Microsoft Azure

Charles Verdon, MBA

Published Feb 9, 2016

Context

I have decided to write this blog post as quite a few of my clients are asking me how best to enable a data culture inside their organizations. Satya Nadella says: “this is where I think we as leaders of our businesses get to do the most transformative thing, which is to first make sure that everyone inside of the organization has the tools, has the capability to be able to gain these insights and then we empower them to act on those insights. This data culture is very much the journey that Microsoft itself is on.”

To enable a data culture, everyone must be able to access the data they need when they need it. For this, we need a BI architecture built for data exploration.

The diagram above illustrates the different approaches taken by traditional, deductive, BI versus the new trend of exploratory BI which is inductive. The usual approach for data warehousing project is to start from company strategies and then understand business requirements, such as produced by the balanced scorecard methodology. These requirements are then broken down into technical requirements such as KPIs that are implemented using ETL, DW, and OLAP technologies. This top-down, deductive, method is very good for descriptive and diagnostic analyses.

In our face-pace world, where agility is key there is a new way to do BI through inductive, bottom-up methods. This enables us to move up the value chain to predictive analytics and prescriptive analytics thus maximizing the value of an organization’s informational assets. The inductive approach starts from the unprocessed sources of information, co-located or not in a data lake, to enable business users to observe and play with the data. With the right tools they can then detect patterns, think of hypotheses and then confirm them.

Far from being an either/or, deductive BI and inductive BI complement each other and form part of a modern data warehouse strategy. Both set of technologies enable scenarios that are valuable for organization and the top-down authoritative metrics for an organization is not threaten by the need of flexibility and agility that an inductive approach allow. At Microsoft we believe that most organization will benefit from both of those type of analytics.

Azure Data Catalog

Let’s now consider how to enable our business users with inductive BI for data exploration.

The first step is to enable business analysts to find the data inside the organization. I refer this as the last mile problem: how do business users know where to connect to get the data they need in order to make a decision. Azure Data Catalog solves this challenge by indexing Microsoft and non-Microsoft sources, relational and non-relational sources, and on-premises and cloud sources. Only the index information goes to the cloud to enable business users to search for data with a simple web page.

Far from being a static and obsolete data dictionary on a wiki page seldom updated, Azure Data Catalog enable business users to refine the metadata for example by describing how a column can be useful for analysis. Furthermore, tags can be added to indexed sources to highlight Production and Certified data sources as a mechanism to enforce governance on the data.

Another benefit of the Data Catalog is that it does not require consolidating all the datasets in a central location such as a data lake. The catalog will index the information where it is and, if this is acceptable for administrators, users will find the information in the catalog using the tool of their choice and connect directly to the database using their local workstation credentials. Although this means that end-user analytics will impact the database, this sometimes can be considered a good trade-off when one does not want to do copy and load all the sources in a central data lake.

Regarding client tools, Azure Data Catalog integrates into existing software and processes with open REST APIs. This means that common tools such use Excel, Power BI Desktop, or Tableau will all be able to get more value from your enterprise data assets. Business analysts can then publish those reports to their business user audience.

Azure Data Lake

We now add the concept of the data lake to the architecture. As you can see, we have to copy our information to the data lake service. For business analysts, nothing changes as they can still find what they need through the data catalog. However, this enables a new persona, the data scientist, to do advanced analytics and machine learning using a variety of tools:

HDInsight (Apache Hadoop, Spark, HBase, and Storm based on the Hortonworks data platform)
Revolution Analytics (now Microsoft R Open, Microsoft R Server, and SQL Server R Services)
Data Lake Analytics (a new U-SQL language that combine the benefits of T-SQL and C# for big data batch processing without the need for a Hadoop cluster)
Azure Machine Learning (a cloud-based predictive analytics platform for data science)

So what is the Azure Data Lake Store? In summary Azure Data Lake Store is HDFS for the cloud! Imagine a HDFS-compliant, cloud provisioned storage service that any R and Hadoop distribution can use for storage. It is deeply integrated with the rest of the Cortana Intelligence Suite services. The goal is to provide an abstraction layer for big data storage that is unlimited (no fixed limit on account size, not limit on file size), high throughput (ultra-fast read/write for analytics) and low latency (optimized for real-time scenarios). At Microsoft we have been using the core of this technology internally for years, with multiple exabytes of data, before making it available on Azure.

The goal is for the data scientist to discover new insights from the data. And when something valuable is discovered, it can then be incorporated back into the traditional BI systems with ETL either from the data lake or directly at the sources.

Azure SQL Data Warehouse

The new insights can be integrated back into a traditional on-premises database, but why not extend the benefits the cloud to the data warehouse itself? Azure SQL Data Warehouse is an elastic warehouse as a service with enterprise-class features: petabyte scale with massively parallel processing, independent scaling of compute and storage (in seconds), full enterprise-class SQL Server experience with T-SQL compatibility.

The great thing about this technology is that you pay for what you need. If the warehouse is not active during the weekend, with low usage at night, medium usage during the day and high usage at month-end, you can scale the service for your need and even pause the compute nodes to avoid paying for resources you do not need! This is breakthrough query price/performance that is only possible in the cloud!

Conclusion

To end, I encourage you to visit the home page of Cortana Intelligence Suite to discover the full potential of Microsoft’s cloud data platform. The services I presented here are building blocks that can be assembled in solutions that fit your needs. You can find industry solution templates directly on that site with more to come in the future.

Gary H. 8y

Hi Charles, I landed here from your article on https://blogs.technet.microsoft.com/cansql/2016/02/08/architecture-for-data-exploration-using-microsoft-azure/ Wish to get some more information as the picture was 'blurrish' on both articles, would you be kind enough to send me some more information on this to gary.how@ibo.org thank you in advance, Gary.

Fabian Keim 9y

Hi Charles, is there an update to the story? Thank you!

Jason Annable 10y

Excellent blog post Charles Verdon, MBA. And very timely too. Do you mind if I use some of this content in a presentation I'm preparing? (I'll provide the source.)

See more comments

To view or add a comment, sign in

Architecture for Data Exploration using Microsoft Azure

Charles Verdon, MBA

Context

Azure Data Catalog

Azure Data Lake

Azure SQL Data Warehouse

Conclusion

More articles by Charles Verdon, MBA

Others also viewed

Top 16 Big Data Analytics Companies in 2026 Driving Data-First Growth

Fast track Data Strategy for Leaders

High performance data warehouse Rule 11: Your workloads will drive your data design technique (modeling).

Data Lake vs. Data Warehouse: Which Term Should You Use for Analytical Conversations?

Data Warehouse vs Data Lake vs Data Lakehouse vs Data Mesh — A Complete Guide

2022 Data Analytics Predictions

Data Engineering Best Practices: Building Scalable Data Pipelines with Azure Synapse Analytics

Breaking Down Data Pipelines: A Deep Dive into Efficient Data Engineering with Azure

DWH vs DataLake vs DataLakehouse vs Data Fabric vs Data Mesh

How to Build a Data-Driven Business Culture

How to Build a BI and Analytics Strategy

Leveraging Big Data for Business Insights

How Data Analytics can Improve Business Strategy

Explore content categories

Context

Azure Data Catalog

Azure Data Lake

Azure SQL Data Warehouse

Conclusion

More articles by Charles Verdon, MBA

Trust: key for successful culture transformation to unlock data value

Information Valuation: Twilio Authy Data Breach Case Study

Executive AI Workshop for Scale.AI

Quantum Computing - Quantum Networks

Quantum Computers: Startups in the race!

Quantum Computers: IBM, Google, Intel, Microsoft

Quantum Computers: Models of Computation

Learning Path for Python Data Science

Market for Data Scientists

Quantum Computers: Optimism and Pessimism

Others also viewed

Top 16 Big Data Analytics Companies in 2026 Driving Data-First Growth

Fast track Data Strategy for Leaders

High performance data warehouse Rule 11: Your workloads will drive your data design technique (modeling).

Data Lake vs. Data Warehouse: Which Term Should You Use for Analytical Conversations?

Data Warehouse vs Data Lake vs Data Lakehouse vs Data Mesh — A Complete Guide

2022 Data Analytics Predictions

Data Engineering Best Practices: Building Scalable Data Pipelines with Azure Synapse Analytics

Breaking Down Data Pipelines: A Deep Dive into Efficient Data Engineering with Azure

DWH vs DataLake vs DataLakehouse vs Data Fabric vs Data Mesh

Similar topics

How to Build a Data-Driven Business Culture

How to Build a BI and Analytics Strategy

Leveraging Big Data for Business Insights

How Data Analytics can Improve Business Strategy

Explore content categories