- Use capacity utilization metric to determine right pricing tier. Capacity metric indicates how much are you actually using the APIM resource you have provisioned, and it conveys that on a scale of 1-100. For instance, if your APIM instance is showing Capacity = 20 then it is underutilized, and you may consider downgrading the pricing tier. Likewise, if your APIM instance is showing Capacity = 85 then you may run out of capacity soon, and you should configure / review the auto-scale rules so that additional instances may spin-up during increased load. Again, prefer scale-out over scale-up to beef up cloud resources and take advantage of elasticity. When you must need a feature present only in higher tier than you should considering scaling-up.
- Use developer tier for non-production workloads. Developer tier is 1/3rd price of basic tier. Self-hosted gateways and workspaces are free under developer tier. You lose SLA and auto-scale capability on the other hand. In most cases, you should be fine living with those limitations for non-production workloads.
- Use consumption-based tier over fixed-provisioned tiers if its limitations are not impacting your use case. Consumption tier as the name suggests will apply charges as per load and to add cherry-on-top, first million hits per month per subscription are not charged. SLA matches standard tier and auto-scale works OOB. But there are few limitations with consumption tier like you lose Vnet integration and caching features which may be a deal-breaker in some scenarios. Check comparison table between various tiers here.
- Use custom auto-scale rules (based on capacity or schedule) instead of manual scaling (to fixed count).
- Use workspaces feature instead of spawning up new APIM resources for each project / team thereby saving on fixed costs per resource.
- Upgrade Premium v2 to Premium v3 workloads. Premium v3 workloads are eligible to enroll under azure savings plan or azure reservations which gives a decent amount of savings over 1-year to 3-years horizon. Additionally, Premium v3 workloads (thanks to new CPU setups) provide more CPU/memory per dollar over their v2 equivalents.
- Use scale-out over scale-up as strategy to handle additional load on app. Scale-up is inefficient and inelastic by nature. To handle workload, you may scale-up an instance but the moment the workload gets reduced or goes to minimum, you will be wasting money on unused resource. On the other hand, scale-out is efficient and elastic by nature. You can define custom auto-scale rule to spin up additional instances during higher workload duration and to spin down unutilized instances during lower workload duration thereby saving you dollars.
- In non-prod environments, pack more apps per app service plans than you would do in a production environment. Suppose you need to run 5 apps on azure. In non-prod environments, you can run all these 5 apps on a single app service plan with scale-out rule (custom) defined. Since, all 5 apps are packed on single app service plan (running on single VM behind-the-scenes), the operating system and runtime environment layers will be shared by these apps thereby reducing overall resource requirement (CPU/Memory) and if a single instance is insufficient to run these 5 apps, then auto-scale will spin-up an additional instance when required. This way, you don’t need 5 app service plan instances (5 VM instances behind the scenes) and can be squeezed to run on fewer app service plan instances. In prod environment, you may use more/dedicated app service plans for these apps to avoid disruption (noisy neighbor).
- On-demand scale-up/scale-down resources needed by ADF pipeline. ADF pipeline is a background orchestration process to fetch raw-data, process / transform it and then push the transformed-data in a data store. There can be a variety of data sources / sinks which ADF can work with. You can add a pre-stage before the pipeline executes and post-stage after the pipeline execution finishes. In pre-stage, you can scale-up all required resources like data sources / sinks so that ADF pipeline can run efficiently and finish within expected duration. In post-stage, you can scale-down all required resources to either minimum tier if those are exclusively used by ADF or leave them at the assigned tier before pre-stage scaled them up.
- Avoid geo-redundancy in non-prod workloads: Regional replication with automatic failover setup enables you to service requests even when primary azure region is down. This facility is very useful in production environment where every minute of downtime causes loss of revenue. However, it is avoidable in non-production SQL database workloads as test databases may not have such high availability requirements as expected by production databases.
So concise and to the point recommendations for saving cost.