“An energy-aware scheduling algorithm for big data applications in Spark. Cluster Computing" [Hongjian et al., 2020]: Extended Abstract
Abstract— The purpose of this text is to provide an extended abstract of the article “An energy-aware scheduling algorithm for big data applications in Spark. Cluster Computing” [Hongjian et al., 2020], focusing namely on (1) identifying the problem the authors intend to solve; (2) the main reasons that make the problem interesting, relevant, and timely; (3) what is the approach to the problem and finally (4) discuss the results that the authors hope to achieve with the proposed approach, as well as its limitations.
Keywords—Efficient Energy Use, Apache Spark, Scheduling, Algorithm
Identification of the Problem
The authors identify the problem of high carbon emissions caused by the increased adoption and dimension of large-scale processing systems. In Apache Spark, a widely adopted general purpose big data processing framework, the assignment of tasks to executors follows either a FIFO (first-in-first-out) or FAIR (round-robin) scheduling strategy, not considering energy consumption as an evaluation criterion.
Reasons for the Problem’s interest, relevance or timeliness
Historically, production modes evolved towards efficiency, which does not only mean achieving maximum output with minimum input, but also reducing waste and optimizing resource allocation, namely by recognizing that natural resources are finite and that to be sustainable, human processes and activities should not deplete natural resources, threatening our ability to continue to use them in the future.
Since the 1970s[42], this has been a growing concern for natural and social scientists as well as for social movements, however the field of computing engineering has not taken this as a priority [2-3] and has only in recent years started to incorporate energy use optimizations into computer systems, more particularly in large-scale processing systems [8-10].
Cloud providers are starting to incorporate energy efficiency as a concern, both in hardware, cooling efficiency, and monitoring. [43] In the coming years, the public will demand that political actors, institutions and corporations make energy efficiency a priority [44].
Approach to the Problem
Based on an energy consumption model for Apache Spark, the authors propose an energy-aware scheduling algorithm for that general-purpose large-scale processing framework, enhancing energy efficiency of Spark Jobs.
In physics, Energy Efficiency measures useful output compared to the input of an energy conversion process. In technology studies this has come to mean minimizing the amount of energy consumed for an energy service. For a given, constant computational workload it is generally not achievable to reduce its energy consumption and maintain its level of resource utilization, and therefore also maintain or improve its total execution time. To determine the degree of degradation of time performance that is acceptable for a given workload, an SLA (service-level agreement) should be in place.
Apache Spark provides a distributed memory abstraction of the cluster named Resilient Distributed Dataset [RDD]. According to RDD dependencies, each application is divided into inter-dependent stages. Each stage is composed of a set of tasks. Tasks are dispatched from the master node to the worker nodes of a Spark cluster and parallelized by worker nodes.
A.Related work
The authors relate their work in the context of energy consumption models [7-9] and resources consolidation in cloud data centers [10-29]. They identify advances on energy-aware scheduling in MapReduce [5-6,30-33] and task-scheduling algorithms for Apache Spark [34-39].
1)Energy Consumption Models
The main contributors to energy consumption in physical servers are the CPU, disk, memory, and network. [7] proposed a power consumption model for blade servers, considering these four aspects. [8] introduced an energy-efficient model based on CPU and disk utilization, discovering optimal balance points between energy consumption and resource utilization. [9] developed a toolkit called eTune, which directly measures the energy consumption of physical servers and their components, including CPU, disk, memory, drives, and fans.
2)Resource consolidation
The aim of applying these techniques is to minimize costs, while adhering to SLA constraints.
The techniques used for resource consolidation in cloud data centers include cooperative provisioning [10], two-tiered on-demand resource allocation [11], and allocation mechanisms that consider heterogeneous resources [12-14]. Heuristics, such as greedy algorithms have been widely explored [16-21]. Also, genetic algorithms [22-23] and particle swarm consolidation have gained attention due to faster convergence [24-27]. Particle Swarm Optimization was applied in VM migration as part of an energy-efficient and QoS-aware model for consolidating virtual resources in cloud data centers [28].
3)Energy-aware MapReduce scheduling
Given that MapReduce has been largely adopted, over the years several studies have been made on the topic of energy-aware scheduling. [5] proposed that resource cost in the cloud could be reduced by employing a service model, composed of automatic cluster configuration generation and global resource allocation optimization. [30] consists of a low-cost workload generator that achieves good results reducing completion time of small MapReduce jobs. [6] proposes a heuristic task scheduling algorithm based on a-priori energy consumption profiling of MapReduce jobs. [31] proposes techniques to solve the problem of the generation of small partitions in skewed datasets. [32] does configuration generation in heterogeneous clusters, by dividing the Hadoop cluster into smaller homogeneous subclusters. [33] proposes optimization of scheduling algorithms aiming at the reduction of energy consumption.
4)Task scheduling algorithm for Apache Spark
While MapReduce performs all computations by loading and writing from and to disk, Apache Spark performs all the intermediate computations directly from memory. The topic of energy-efficiency in Spark still has much room for improvement.
[34] considers the cost of applications and complete time to provide fine-grained resource allocation, increasing performing and reducing costs. [35] proposes a completion time prediction model that considers the size of the input data, the number of iterations and the dimension of the underlying infrastructure. [36] proposes two types of optimizations to the Spark implementation of the Random Forest algorithm, including a task-parallel optimization that reduces the cost of data communication by using different task-schedulers based on data locality constraints of tasks. [37] aims to predict task remaining time, identification of time-consuming tasks and reduce over-execution in heterogeneous clusters. [38] proposes a QoS-aware scheduling strategy based on entropy. [39] found that the execution of spark applications in all nodes may cause waste of resources and inability to reduce completion time. Therefore, a dynamic partition configuration algorithm was used in solving this issue.
B.Energy Consumption model for Spark
The authors formalize an energy consumption model for Spark based on 8 definitions, namely:
A Job is any task that needs to run to evaluate a Spark action.
Due to RDD dependence, any hob is divided by the DAGScheduler into several stages. A job may consist of n stages. The default task scheduler of Apache Spark follows either a FIFO or FAIR scheduling strategy.
According to the number of partitions in the last RDD, Stage is divided into several Tasks:
The Master node distributes tasks to several worker nodes in a cluster, and the worker nodes are responsible for communication with the master node and monitoring executor processes. Each executor process contains an executor object with holds a thread pool, and a thread can execute only one task. In standalone mode, there is one executor process per node assigned with all memory resources available. Exe is the set of executor processes in a spark cluster (formula 4).
Relation Rijkl is defined as 1 or 0 as taskijk is assigned to exel (formula 5).
Then the energy consumption of the jth stage in Job I is defined in formula 6 with eijkl as the energy consumption of exl when executing taskijkl:
The formula for the energy consumption of th ith job in a Spark application is as follows:
The energy consumption of a Spark application is defined as follows:
C.Energy-aware Spark Algorithm
Based on the formulas in the model, the authors propose an energy-aware scheduling algorithm for Spark (EASAS). For that, they define the evaluation criterion avei as the average energy consumption of process exl.
The average energy consumption of process exl can be calculated by obtaining the energy consumption per second and dividing it by the total number of tasks in stage*.
The proposed algorithm can be broken down into the following steps:
The input for the algorithm is the executor queues Exe and the stages that need to be allocated.
A strategy table recording the historical execution time and energy consumption is maintained. Using that table, the executor processes exl is sorted by avel. Lower avel values mean that exl has a higher priority in task assignment. The queue is sorted in ascending order. Executors with execution time equal to 0 are placed at the head of the queue.
Two sets are maintained when running the algorithm: Set0 and TaskQueue. While the execution time in the current executor is 0, the task is present in Set0, then pushed in ascending order to the double-ended list TaskQueue.
In each iteration, the feasibility of allocating the task to the current executor is evaluated. If it is feasible, the current executor pulls a task from Set0 and starts executing it. If not, a task is pulled from the tail of the queue.
If resources are not enough, the next executor is selected to perform a new iteration. If both Set0 and TaskQueue are empty, it means all tasks were assigned.
With n = number of executors; t = number of tasks and cap = number of tasks on an executor, the running time of the loop is O(n ( t+ log n ) + t/cap (t log t + cap)).
Review of Results
A variety of experiments were conducted to verify the properties of EASAS, namely comparing its performance with the FIFO and FAIR scheduling strategies.
Four workloads from the HiBench benchmark suite were selected: Sort, Terasort, PageRank and K-means clustering.
The Spark cluster consists of 6 nodes. Each physical node has 8Gb memory, 16-core processor, 500Gb hard drive. The cluster has a total of 48Gb memory, 96 cores and 3000 Gb storage, and network speed of 1Gbps.
Energy consumption is calculated by monitoring the resource usage of executors. A script records CPU and memory usage in each executor every second. The energy consumption outlined in [41] is used to calculate energy consumption. Execution time is obtained by analysis of the Spark event logs.
Each workload runs 100 times, 50 times to generate the strategy table, and 50 times for test data, with 95% confidence interval. An average is calculated with the valid data.
The energy consumption of the Sort benchmark EASAS saw an average of 34,4% less energy than FIFO and 34,6% less than FAIR with deadline of 60 seconds. The authors varied the number of partitions from 10 to 100, and the recorded energy consumption was always lower than FIFO and FAIR. Concerning execution time, with a low number of partitions, EASAS records a higher running time than the other scheduling strategies, however it is close to them from 40 to 100 partitions, showing that it can reduce energy consumption without increasing execution time too much. Execution time is increased by 6,5% optimally while energy consumption is reduced by 41,2% compared to FAIR.
For the TeraSort benchmark, the energy consumption of EASAS is on average 29,8% lower than FIFO and 26,3% lower than FAIR. With the spark shuffle partitions increasing, the execution time of TeraSort for EASAS is significantly decreased. In the case of number of partitions higher than 50, the execution time of EASAS is similar to the execution time of FIFO and FAIR. Optimal scheduling was achieved at 60 partitions. In that case, energy consumption is 35,8% less than that of FIFO and the execution time is reduced by 7,4%. For FAIR, energy consumption is reduced by 33,4% and execution time by 2,7%.
For PageRank, energy consumption is reduced by 27,1% and 25,6% on average compared with FIFO and FAIR respectively. The energy consumption of the workload for EASAS still increases with the number of spark shuffle partitions. The execution time of PageRank for EASAS significantly decreases while the number of spark shuffle partitions increases. The optima point is achieved at 30 partitions. Energy consumption in that case is 51,2% lower than FIFO and 56,3% lower than FAIR. Although the execution time for EASAS is longer than FIFO and FAIR, all jobs are completed within the dealine constraints (120s).
The execution time of EASAS is higher than that of the default scheduling algorithms. PageRank has more stages than Sort and TeraSort, which means that there are more spark shuffles, which negatively affects execution time. As the number of partitions increases, the execution time for EASAS is gradually closer to FAIR and FIFO.
For K-Means Clustering, that consists of 14 jobs and 20 stages, EASAS produces the lowest energy consumption in all tests. The energy consumption is reduced on average by 22,6% and 28,2% respectively compared to FIFO and FAIR. The optimal point is achieved at 50 partitions. In that case energy consumption is reduced by 51,2% and 65,3% compared to FIFO and FAIR respectively. Execution time of the proposed algorithm is lower in most cases. The average execution time of EASAS is 4,5% lower than FIFO and 6,1% lower than FAIR. In the optimal point, execution time is reduced by 5,8% and 7,5% relative to FIFO and FAIR respectively.
Compared with other studies that focused on workload balancing [30-31] or minimizing the makespan of job’s execution [34-39], the proposed algorithm can significantly reduce energy consumption in Spark applications while satisfying deadline constraints.
When discussing experimental results, the authors highlighted three advantages of EASAS compared to previously proposed approaches: (1) it is based on an energy consumption model and is applicable to Spark, which is widely used; (2) it aims to minimize energy consumption while satisfying SLAs; and (3) the strategy table and greedy approach allows us to find the optimal executor and place as many tasks as possible in the optimal executor.
While slightly increasing execution time compared to FAIR and FIFO. The reasons for this may include: the overhead of reading the strategy table, sorting the executors and tasks; data communication costs may be incurred when selecting the optimal executor based on the criterion of energy consumption; the greedy approach may overload physical nodes.
Conclusions
As Spark is widely adopted as a processing engine for big data applications, incorporating energy efficiency as a scheduling criterion may contribute to lower energy consumption in data centers. The approach taken by Hongjian et al. outlines an energy consumption model that enables us to calculate the energy consumption of a spark application and base scheduling decisions on the expected energy consumption while satisfying SLAs by recording execution time and energy consumption in a historical strategy table and by allocating tasks to the optimal executor.
The tests were conducted by executing four workloads from the HiBench suite 100 times, and the results obtained allow us to confirm that EASAS can reduce the energy consumption on a Spark cluster, while only slightly increasing execution time. It is further able to reduce energy consumption while meeting task duration deadlines.
For future work, the authors point out that the scheduling strategy may be refined to allow trading off execution time and energy consumption dynamically to avoid SLA violation. They also highlight that they would like to continue optimizing the algorithm for small shuffle blocks.
References
[1] Hintemann, R., Beucker, S., Clausen, J., Stobbe, L., Proske, M.,
Nissen, N.F.: Energy efficiency of data centers-A system-oriented
analysis of current development trends. In: Proceedings of the
Electronics Goes Green 2016 ? (EGG), pp. 1–5. IEEE (2016)
[2] Salahuddin, M., Alam, K.: Information and Communication
Technology, electricity consumption and economic growth in
OECD countries: a panel data analysis. Int. J. Electr. Power
Energy Syst. 76, 185–193 (2016)
[3] Zaharia, M., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S.,
Stoica, I., Venkataraman, S., et al.: Apache Spark. Commun.
ACM 59(11), 56–65 (2016)
[4] Zhang, A.Z.: Spark Technology Insider. Mechanical Industry
Press, Beijing (2015)
[5] Palanisamy, B., Singh, A., Liu, L.: Cost-effective resource provisioning
for mapreduce in a cloud. IEEE Trans. Parallel Distrib.
Syst. 26(5), 1265–1279 (2015)
[6] Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.:
Energy-aware scheduling of mapreduce jobs for big data applications.
IEEE Trans. Parallel Distrib. Syst. 1, 1–1 (2015)
[7] Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.:
Cloud computing and emerging IT platforms: vision, hype, and
reality for delivering computing as the 5th utility. Fut. Gener.
Comput. Syst. 25(6), 599–616 (2009)
[8] Srikantaiah, S., Kansal, A., Zhao, F.: Energy aware consolidation
for cloud computing. Clust. Comput. 12, 10 (2008)
[9] Ge, R., Feng, X., Wirtz, T., Zong, Z., Chen, Z.: ETune: a power
analysis framework for data-intensive computing. In: Proceedings
of the 2012 41st International Conference on Parallel Processing
Workshops, pp. 254–261. IEEE (2012)
[10] Zhan, J., Wang, L., Li, X., Shi, W., Weng, C., Zhang, W., Zang,
X.: Cost-aware cooperative resource provisioning for heterogeneous
workloads in data centers. IEEE Trans. Comput. 62(11),
2155–2168 (2013)
[11] Song, Y., Sun, Y., Shi, W.: A two-tiered on-demand resource
allocation mechanism for VM-based data centers. IEEE Trans.
Serv. Comput. 6(1), 116–129 (2013)
[12] Nejad, M.M., Mashayekhy, L., Grosu, D.: Truthful greedy
Recommended by LinkedIn
mechanisms for dynamic virtual machine provisioning and allocation
in clouds. IEEE Trans. Parallel Distrib. Syst. 26(2),
594–603 (2015)
[13] Mashayekhy, L., Nejad, M.M., Grosu, D.: Cloud federations in
the sky: formation game and mechanism. IEEE Trans. Cloud
Comput. 3(1), 14–27 (2015)
[14] Mashayekhy, L., Nejad, M.M., Grosu, D., Vasilakos, A.V.:
Incentive-compatible online mechanisms for resource provisioning and allocation in clouds. In: Proceedings of the 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 312–319. IEEE (2014)
[15] Hacker, T.J., Mahadik, K.: Flexible resource allocation for reliable
virtual cluster computing systems. In: Proceedings of the
2011 International Conference for High Performance Computing,
Networking, Storage and Analysis, p. 48. ACM (2011)
[16] Rajyashree, V.R.: Double threshold based load balancing
approach by using VM migration for the cloud computing environment.
Int. J. Eng. Comput. Sci. 4(01), 9966–9970 (2015)
[17] Verma, A., Ahuja, P., Neogi, A.: pMapper: power and migration
cost aware application placement in virtualized systems. In:
Proceedings of the 9th ACM/IFIP/USENIX International Conference
on Middleware, pp. 243–264. Springer, New York (2008)
[18] Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource
allocation heuristics for efficient management of data centers for
cloud computing. Fut. Gener. Comput. Syst. 28(5), 755–768
(2012)
[19] Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms
and adaptive heuristics for energy and performance efficient
dynamic consolidation of virtual machines in cloud data
centers. Concurr. Comput. 24(13), 1397–1420 (2012)
[20] Gupta, R., Bose, S.K., Sundarrajan, S., Chebiyam, M., Chakrabarti,
A.: A two stage heuristic algorithm for solving the server
consolidation problem with item-item and bin-item incompatibility
constraints. In: Proceedings of the 2008 SCC’08, IEEE
International Conference on Services Computing, vol. 2,
pp. 39–46. IEEE (2008)
[21] Dai, X., Wang, J.M., Bensaou, B.: Energy-efficient virtual
machines scheduling in multi-tenant data centers. IEEE Trans.
Cloud Comput. 4(2), 210–221 (2016)
[22] Quang-Hung, N., Nien, P.D., Nam, N. H., Tuong, N.H., Thoai,
N.: A genetic algorithm for power-aware virtual machine allocation
in private cloud. In: Proceedings of the Information and
Communication Technology-EurAsia Conference, pp. 183-191.
Springer, Berlin (2013)
[23] Agrawal, S., Bose, S.K., Sundarrajan, S.: Grouping genetic
algorithm for solving the server consolidation problem with
conflicts. In: Proceedings of the first ACM/SIGEVO Summit on
Genetic and Evolutionary Computation, pp. 1–8. ACM (2009)
[24] Eberhart, R., Kennedy, J.: A new optimizer using particle swarm
theory. In: Proceedings of the Sixth International Symposium on
Micro Machine and Human Science, MHS’95, pp. 39–43. IEEE
(1995)
[25] Del Valle, Y., Venayagamoorthy, G.K., Mohagheghi, S., Harley,
R.G., Hernandez, J.C.: Particle swarm optimization: basic concepts,
variants and applications in power systems. IEEE Trans.
Evol. Comput. 12, 171–195 (2008)
[26] Zeng, N., Wang, Z., Zhang, H., Alsaadi, F.E.: A novel switching
delayed PSO algorithm for estimating unknown parameters of
lateral flow immunoassay. Cogn. Comput. 8(2), 143–152 (2016)
[27] Xiong, A.P., Xu, C.X.: Energy efficient multiresource allocation
of virtual machine based on PSO in cloud data center. Math.
Probl. Eng. (2014). https://doi.org/10.1155/2014/816518
[28] Li, H., Zhu, G., Cui, C., Tang, H., Dou, Y., He, C.: Energyefficient
migration and consolidation algorithm of virtual
machines in data centers for cloud computing. Computing 98(3),
303–317 (2016)
[29] Li, H., Zhu, G., Zhao, Y., Dai, Y., Tian, W.: Energy-efficient and
QoS-aware model based resource consolidation in cloud data
centers. Clust. Comput. 20(3), 2793–2803 (2017)
[30] Ren, Z., Wan, J., Shi, W., Xu, X., Zhou, M.: Workload analysis,
implications, and optimization on a production hadoop cluster: a
case study on taobao. IEEE Trans. Serv. Comput. 7(2), 307–321
(2014)
[31] Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data
skew in MapReduce. Closer 11, 574–583 (2011)
[32] Cheng, D., Rao, J., Guo, Y., Jiang, C., Zhou, X.: Improving
performance of heterogeneous mapreduce clusters with adaptive
task tuning. IEEE Trans. Parallel Distrib. Syst. 28(3), 774–786
(2017)
[33] Tian, W., Li, G., Yang, W., Buyya, R.: HScheduler: an optimal
approach to minimize the makespan of multiple MapReduce jobs.
J. Supercomput. 72(6), 2376–2393 (2016)
[34] Islam, M.T., Karunasekera, S., Buyya, R.: dSpark: deadline-based
resource allocation for big data applications in Apache Spark. In:
Proceedings of the 2017 IEEE 13th International Conference one-
Science (e-Science), pp. 89–98. IEEE (2017)
[35] Sidhanta, S., Golab, W., Mukhopadhyay, S.: Optex: a deadlineaware
cost optimization model for spark. In: Proceedings of the
2016 16th IEEE/ACM International Symposium on Cluster,
Cloud and Grid Computing (CCGrid), pp. 193–202. IEEE (2016)
[36] Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., Li, K.: A
parallel random forest algorithm for big data in a spark cloud
computing environment. IEEE Trans. Parallel Distrib. Syst. 1,
1–1 (2017)
[37] Yang, H., Liu, X., Chen, S., Lei, Z., Du, H., Zhu, C.: Improving
Spark performance with MPTE in heterogeneous environments.
In: Proceedings of the 2016 International Conference on Audio,
Language and Image Processing (ICALIP), pp. 28–33. IEEE
(2016)
[38] Chen, H., Wang, F.Z.: Spark on entropy: a reliable & efficient
scheduler for low-latency parallel jobs in heterogeneous cloud.
In: Proceedings of the 2015 IEEE 40th International Conference
on Local Computer Networks Conference Workshops (LCN
Workshops), pp. 708–713. IEEE (2015)
[39] Gounaris, A., Kougka, G., Tous, R., Montes, C.T., Torres, J.:
Dynamic configuration of partitioning in spark applications.
IEEE Trans. Parallel Distrib. Syst. 28(7), 1891–1904 (2017)
[40] Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench
benchmark suite: characterization of the MapReduce-based data
analysis. In: Proceedings of the 2010 IEEE 26th International
Conference on Data Engineering Workshops (ICDEW),
pp. 41–51. IEEE (2010)
[41] Luo, L., Wu, W.J., Zhang, F.: Energy modeling based on cloud
data center. J. Softw. 25(7), 1371–1387 (2014)
[42] Donella H. Meadows [and others]. The Limits to Growth; a Report for the Club of Rome's Project on the Predicament of Mankind. New York :Universe Books, 1972.
[43] Amazon Web Services - Sustainability Resources - https://sustainability.aboutamazon.com/environment/the-cloud (Accessed 2023-09-17)
[44] United Nations Framework Convention on Climate Change (UNFCCC) - Paris Agreement - https://unfccc.int/process-and-meetings/the-paris-agreement (Accessed 2023-09-17)