MPP comes to SQL Server
In the past, to process large amounts of data in SQL Server, you had to use an appliance called ADW (Analytics Data Warehouse) or also commonly known as PDW (Performance Data warehouse). ADW is not only a special version of SQL Server but a whole appliance including CPUs, memory and storage. ADW was very expensive and because of the cost, wasn't used all that much. Even though it is expensive, it is also very powerful. The reason behind that it was so powerful is it uses MPP. MPP is Massive Parallel Processing. It devides any computing needs over mutlitple processing nodes with highly partitioned data.
To do the same sorts of workloads that ADW accomplishes, you can actually accomplish that without all the limitations and expenses occurred in ADW. The way to do this is to enable Polybase in SQL Server 2016. While you will need the Enterprise Edition of SQL Server to do this, it is much cheaper and easier than you probably imagine. At it cores, Polybase is a SQL Server implementation of Hive over HDFS (Hadoop). If you are familiar with Hadoop and Hive, you know that the power of Hadoop is in its distributed file system and map reduce over multiple processing nodes. Hadoop provides you with a similar power of processing as MPP. If you are familiar with Hive, you know it provides a SQL interface to produce map reduce jobs over your Hadoop cluster. Now imagine combining HDFS, Map Reduce and TSQL together. That is exactly what Polybase does under SQL Server.
With Polybase, you get you get a massively scalable and powerful MPP engine for your Data Analytic needs in a familiar and easy to use SQL Server implementation. If you need more power, you can just add more nodes to your cluster. If you need the benefits of relational technology it is there too. Just think of many things you can accomplish to processing tons and tons of data for your Data Warehousing and Analytic needs. The possibilities are endless.
If you would like to know more about Polybase and how to architect a great powerful analytics solution, please feel free to contact me.
Very nice article
Good.
MS has been trying to conjure up a credible MPP story since the acquisition of Datallegro about a decade ago. Close. No cigar.
With Embedded R Engine in SQL 2016 and Enhancements in In Memory Analytics, this solution could be effective for Mission Critical Analytics which require more security controls.
The appliance you are referring to is APS (Analytical Platform System) which uses Massively Parallel Processing running SQL Server technology on each of its compute nodes to retrieve data from storage nodes. APS is based on relational data storage on which technology like columnstore are implemented. With the introduction of Polybase to APS it allowed a single T-SQL query to process HDFS stored data (Hadoop file system) in combination with the relational engine. In SQL Server 2016 Polybase was introduced to the SQL Server SMP engine to execute in a similar fashion towards HDFS as Polybase does on APS. It's a little misleading saying MPP comes to SQL Server as the processing takes place on the HDFS cluster.