Confused about DevOps & SRE ?
Problem Statement -
In my experience certain organizations use DevOps & SRE interchangeably. For people and groups to understand which one best suits their needs and what to consider while building these capabilities, let's take a closer look at these concepts. I am hoping the following information will be useful for engineers, product managers, technical executives etc.
What is DevOps ?
It is an organizational culture aimed to execute rapid releases by automating and integrating the efforts of development & operations teams. It covers the entire product life cycle, from ideate -> define -> code -> test -> build/package -> deploy -> monitor. It has helped companies over the past 15+ years in transitioning from doing big bang bi-yearly or yearly releases to iterative incremental functional/feature based releases as often as several times a week. It talks about “what” needs to be done to unify development and operations.
What is SRE ?
It is a set of software engineering practices aimed at ensuring the health, performance & stability of mission critical, complex IT environments. It talks about “how” we can unify development and operations.
These are nothing but two sides of the same coin and compliment each other. Following are the five key areas to look at from DevOps & SRE lens.
Integrated teams -
DevOps Philosophy - Share ownership of the production environment between developers and operations. Increased collaboration between these groups will result in limiting the damage caused by deployment going wrong in the production environment.
SRE Execution - Create cross functional teams sharing the same process, tools and techniques across the company.
Accept the failures -
DevOps Philosophy - All failures cannot be avoided and no system can be 100% reliable so learn from the mistakes and move on.
SRE Execution - Conduct blameless postmortems to avoid the same failures happening twice. Define error budget to ensure the threshold a system/service is allowed to be unreliable.
Target small incremental changes -
DevOps Philosophy - Execute small changes frequently - reducing the cost to failure and time-to-market.
SRE Execution - Effective test suite automation for small changes. Canary deployments - rollout to a small percentage of servers and easier rollback plans.
Tools & automation -
DevOps Philosophy - Eliminate manual work as much as possible. Add new tools to support automation.
SRE Execution - Identify the manual interventions and consider automation for long-term success. This is also referred to as “Automate This Year’s Job Away”.
Measure everything -
DevOps Philosophy - Build metrics to measure full product lifecycle, to check its moving in the right direction.
SRE Execution - SRE should adopt following metrics -
SLI - Service level indicator, measurement of a system’s behavior over a period of time (mins, hours).Typically represented as percentile, median etc. The most common of these are latency, failures per request, throughput of requests per second. This serves as a threshold of system/service behavior.
SLO - Service level objectives, SLI provides availability of the service at point in time. SLO integrates SLI over a longer period of time (months, years) and lets you decide the uptime you care about is relevant. It helps you optimize your reliability parameters so that you are not overachieving it and slowing down the release cycle.
SLA - Service level agreement, a promise service provider gives to the customer based on meeting the certain SLOs. It comes with its set of penalties if not met.
Summary -
To conclude, we see both are not competing against each other rather have the same goal but with varied focus. Properly implemented DevOps will necessitate SRE. Scope of DevOps and/or SRE implementation should be decided based on the goals, size of the organization.
Way to go!