Architecture for Data Exploration using Microsoft Azure

Architecture for Data Exploration using Microsoft Azure

Context

I have decided to write this blog post as quite a few of my clients are asking me how best to enable a data culture inside their organizations. Satya Nadella says: “this is where I think we as leaders of our businesses get to do the most transformative thing, which is to first make sure that everyone inside of the organization has the tools, has the capability to be able to gain these insights and then we empower them to act on those insights. This data culture is very much the journey that Microsoft itself is on.”

To enable a data culture, everyone must be able to access the data they need when they need it. For this, we need a BI architecture built for data exploration.


The diagram above illustrates the different approaches taken by traditional, deductive, BI versus the new trend of exploratory BI which is inductive. The usual approach for data warehousing project is to start from company strategies and then understand business requirements, such as produced by the balanced scorecard methodology. These requirements are then broken down into technical requirements such as KPIs that are implemented using ETL, DW, and OLAP technologies. This top-down, deductive, method is very good for descriptive and diagnostic analyses.

In our face-pace world, where agility is key there is a new way to do BI through inductive, bottom-up methods. This enables us to move up the value chain to predictive analytics and prescriptive analytics thus maximizing the value of an organization’s informational assets. The inductive approach starts from the unprocessed sources of information, co-located or not in a data lake, to enable business users to observe and play with the data. With the right tools they can then detect patterns, think of hypotheses and then confirm them.

Far from being an either/or, deductive BI and inductive BI complement each other and form part of a modern data warehouse strategy. Both set of technologies enable scenarios that are valuable for organization and the top-down authoritative metrics for an organization is not threaten by the need of flexibility and agility that an inductive approach allow. At Microsoft we believe that most organization will benefit from both of those type of analytics.


Azure Data Catalog

Let’s now consider how to enable our business users with inductive BI for data exploration.


The first step is to enable business analysts to find the data inside the organization. I refer this as the last mile problem: how do business users know where to connect to get the data they need in order to make a decision. Azure Data Catalog solves this challenge by indexing Microsoft and non-Microsoft sources, relational and non-relational sources, and on-premises and cloud sources. Only the index information goes to the cloud to enable business users to search for data with a simple web page.

Far from being a static and obsolete data dictionary on a wiki page seldom updated, Azure Data Catalog enable business users to refine the metadata for example by describing how a column can be useful for analysis. Furthermore, tags can be added to indexed sources to highlight Production and Certified data sources as a mechanism to enforce governance on the data.

Another benefit of the Data Catalog is that it does not require consolidating all the datasets in a central location such as a data lake. The catalog will index the information where it is and, if this is acceptable for administrators, users will find the information in the catalog using the tool of their choice and connect directly to the database using their local workstation credentials. Although this means that end-user analytics will impact the database, this sometimes can be considered a good trade-off when one does not want to do copy and load all the sources in a central data lake.

Regarding client tools, Azure Data Catalog integrates into existing software and processes with open REST APIs. This means that common tools such use Excel, Power BI Desktop, or Tableau will all be able to get more value from your enterprise data assets. Business analysts can then publish those reports to their business user audience.

Azure Data Lake


We now add the concept of the data lake to the architecture. As you can see, we have to copy our information to the data lake service. For business analysts, nothing changes as they can still find what they need through the data catalog. However, this enables a new persona, the data scientist, to do advanced analytics and machine learning using a variety of tools:

So what is the Azure Data Lake Store? In summary Azure Data Lake Store is HDFS for the cloud! Imagine a HDFS-compliant, cloud provisioned storage service that any R and Hadoop distribution can use for storage. It is deeply integrated with the rest of the Cortana Intelligence Suite services. The goal is to provide an abstraction layer for big data storage that is unlimited (no fixed limit on account size, not limit on file size), high throughput (ultra-fast read/write for analytics) and low latency (optimized for real-time scenarios). At Microsoft we have been using the core of this technology internally for years, with multiple exabytes of data, before making it available on Azure.

The goal is for the data scientist to discover new insights from the data. And when something valuable is discovered, it can then be incorporated back into the traditional BI systems with ETL either from the data lake or directly at the sources.


Azure SQL Data Warehouse

The new insights can be integrated back into a traditional on-premises database, but why not extend the benefits the cloud to the data warehouse itself? Azure SQL Data Warehouse is an elastic warehouse as a service with enterprise-class features: petabyte scale with massively parallel processing, independent scaling of compute and storage (in seconds), full enterprise-class SQL Server experience with T-SQL compatibility.

The great thing about this technology is that you pay for what you need. If the warehouse is not active during the weekend, with low usage at night, medium usage during the day and high usage at month-end, you can scale the service for your need and even pause the compute nodes to avoid paying for resources you do not need! This is breakthrough query price/performance that is only possible in the cloud!

Conclusion

To end, I encourage you to visit the home page of Cortana Intelligence Suite to discover the full potential of Microsoft’s cloud data platform. The services I presented here are building blocks that can be assembled in solutions that fit your needs. You can find industry solution templates directly on that site with more to come in the future.


Hi Charles, I landed here from your article on https://blogs.technet.microsoft.com/cansql/2016/02/08/architecture-for-data-exploration-using-microsoft-azure/ Wish to get some more information as the picture was 'blurrish' on both articles, would you be kind enough to send me some more information on this to gary.how@ibo.org thank you in advance, Gary.

Like
Reply

Hi Charles, is there an update to the story? Thank you!

Like
Reply

Excellent blog post Charles Verdon, MBA. And very timely too. Do you mind if I use some of this content in a presentation I'm preparing? (I'll provide the source.)

Like
Reply

To view or add a comment, sign in

More articles by Charles Verdon, MBA

  • Trust: key for successful culture transformation to unlock data value

    Cultural transformation Through my interaction with customers, I came to the realization that for data to be a true…

    7 Comments
  • Information Valuation: Twilio Authy Data Breach Case Study

    How to measure the value of information assets In the awesome book Infonomics, Douglas Laney (Gartner) explains how to…

  • Executive AI Workshop for Scale.AI

    Is your organization a member of the Scale.AI supercluster? If so, consider registering for our exclusive event and…

  • Quantum Computing - Quantum Networks

    It’s Quantum Wednesday! I decided to write a weekly article on quantum computing, such an exciting field in rapid…

  • Quantum Computers: Startups in the race!

    It’s Quantum Wednesday! I decided to write a weekly article on quantum computing, such an exciting field in rapid…

  • Quantum Computers: IBM, Google, Intel, Microsoft

    It’s Quantum Wednesday! I decided to write a weekly article on quantum computing, such an exciting field in rapid R&D…

    2 Comments
  • Quantum Computers: Models of Computation

    It’s Quantum Wednesday! I decided to write a weekly article on quantum computing, such an exciting field in rapid R&D…

  • Learning Path for Python Data Science

    I was recently asked by a student with no prior programming experience what would be the best resources to learn how to…

    2 Comments
  • Market for Data Scientists

    A lot of talk is made of the difficulty in finding the right people needed to kick-start data science initiatives…

    1 Comment
  • Quantum Computers: Optimism and Pessimism

    After exploring the topic of quantum computation in an intro edX class, my high-level understanding is that although…

Others also viewed

Explore content categories