Maximizing Resource Utilization with PAWS
Overview
In a data center, resource utilization indicators, especially the CPU utilization, are often used to measure the efficiency of a cluster. In practice, both resource allocation and workload scheduling can affect resource utilization.
If resources are allocated to workloads inaccurately, the cluster may be underutilized due to idle or insufficient resources. Even if the utilization increases, if the scheduling decisions are improper, resource conflicts between workloads can also lead to overall performance degradation.
Performance Aware System (PAWS) is thereby developed to improve resource utilization and ensure the quality of service (QoS), while minimizing performance degradation caused by resource interference.
The vision of PAWS is to deliver a set of scheduling algorithms that can recommend resources based on historical workload characteristics while avoiding mutual interference.
Features
PAWS is designed to address the issue of resource interference and recommend appropriate resource allocation. It has the following features:
Feature 1: VPA Resource Recommendation
Vertical Pod Autoscaler (VPA) is an automatic scaling technology that adjusts physical resources (such as CPUs and memory) allocated to microservices to meet the changing demands. Each service has its unique resource demands that are influenced by several factors, such as time and user needs. Fixed resource allocation to these services may result in very low resource utilization of the cluster.
Recommended by LinkedIn
The VPA recommendation algorithm of PAWS is therefore proposed, which combines classical numerical optimization solutions with machine learning techniques. VPA analyzes historical workload characteristics to recommend appropriate resources for each workload, freeing up improperly allocated resources and thereby enhancing the cluster utilization.
Feature 2: Detection and Scheduling of Time Series Conflicts
PAWS has a set of scheduling plugins used to perform time series analysis based on historical resource utilization statistics. The scheduler then staggers peak and off-peak workloads to avoid resource conflicts, thereby achieving more accurate resource allocation. It uses a specific mechanism to collect resources marked in the system. By collecting historical resource usage data of job containers and analyzing time series cycles (such as every hour), it predicts resource utilization of each cycle to prevent job resource conflicts, thus achieving optimal scheduling and improving cluster resource utilization.
The feature algorithm comprises prediction and scheduling processes. The prediction process collects statistics on time series changes based on the historical data of each workload type. The scheduling process then makes reasonable scheduling decisions based on the collected statistics and the characteristics of each new task.
Summary
PAWS helps with more efficient resource allocation and scheduling for workloads through machine learning and mathematical analysis, leading to substantial boosts in overall cluster utilization. While robust, it does have limitations, notably its emphasis on CPU-intensive scenarios. Thus, we invite developers keen on this technology to collaborate with us in refining PAWS continually.
For further details on PAWS and other features of openEuler, please visit https://gitee.com/openeuler.