Big Data – Tez, MR, Spark Execution Engine : Performance Comparison
There is no question that massive data is being generated in greater volumes than ever before. Along with the traditional data set, new data sources as sensors, application logs, IOT devices, and social networks are adding to data growth. Unlike traditional ETL platforms like Informatica, ODI, DataStage that are largely proprietary commercial products, the majority of Big ETL platforms are powered by open source.
With many execution engines, customers are always curious about their usage and performance.
To put it into perspective, In this post, I am running a set of the queries against 3 key Query Engines namely Tez, MapReduce, Spark (MapReduce) to compare the query execution timings.
sensordata_csv Table definition :
create external table sensordata_csv(ts string,deviceid int,sensorid int,val double)row format delimitedfields terminated by '|'stored as textfilelocation '/user/sranka/MachineData/sensordata';
sensordata_part Table definition :
create table sensordata_part(deviceid int,sensorid int,val double) partitioned by (ts string)clustered by (deviceid) sorted by (deviceid) into 10 bucketsstored as orc;
Query 1 –select count(*) from sensordata_csv where ts = ‘2014-01-01’
Query 2–select count(*) from sensordata_part where ts = ‘2014-01-01’
Below tables shows the execution timings :
Conclusion Which Engine is right :
Spark being In memory execution engine comes out to be a clear winner, but in certain scenario especially in the current scenario of running query on partition table TEZ execution engines comes closer to spark.
With this you can not conclude that you Spark will solve your — World Hunger Problem — of Big ETL, being continuously growing product Spark has its own challenges when it comes to productization of the Spark workload, same holds True with TEZ. In all MR engine has been around for the most time and its been the core of HDFS framework, for mission critical workloads which are not time bound, MR could be the best choice.
For Details please read blog https://goo.gl/Jv0QG4
Hope This Helps,
Sunil S Ranka
"Superior BI is the antidote to Business Failure"
About Spark : http://spark.apache.org/
About MapReduce : https://en.wikipedia.org/wiki/MapReduce
About Tez : https://tez.apache.org/