Demand Scaling - that art of matching compute instance to the demand workload
Most applications do not run all the time at 100% load, high demand, maximum transaction throughput - real life demand ebbs and flows, is sometimes high, but mostly low with some periods of medium demand.
Aligning the amount of computation infrastructure against that real life demand is the art of demand scaling ( a form of intraday scheduling of compute need ). In a fully automated and “cloud mature” approach, applications would be built to be “aware” of the demand levels, and be able to scale up, that’s increase server instances, to cater for the increased workload, and likewise, scale down when the demand diminishes.
In cloud computing this is achieved by auto scaling - the use of cloud formations to define the instance count ideal low value ( say 1 or 2 transaction front line servers behind a load balancer ), with settings to define a monitoring ( of cpu utilisation say ) and indications of when levels of load indicate to increase server count ( cpu above 70% say ), or decrease server count ( cpu below 30% say ).
Using cloud formations in this way automates the application “auto scaling” against demand. But the same could also be achieved for a legacy application ( that are unable to make real-time monitoring and load analysis ) by way of scripts. After human analysis of daily demand profiles for example, scripts could be written that increase server counts at pre-determined times ( say increasing from 1 server to 3 servers at 8am start of day, and increasing more around 11am ready for midday peak load ), in this way the application total compute needs can be scaled up to match peak demand periods and reduced for out of demand periods ( like overnight ).
Applying this strategy ensures cloud consumption, which is charged when compute instances are running, can be optimised and achieve cost savings by running the maximum number of servers only when demand is high. Large savings can be achieved running a 1 server to peak of 10 servers across the business working week and working hours – that be Monday to Friday or Monday to Sunday for a full 7 days business need - instead of the peak 10 servers all running 24*7.