"Ops", the dark side of DevOps

"Ops", the dark side of DevOps

"Our service is down ! Someone has a clue to fix this ?" Often, it's in such key moment that everybody suddenly realizes the value of Ops: when in trouble. When Ops start to play the firefighters. When in a complete crisis they glimpse at logs and metrics, check technical stacks of virtual servers, containers, routing, firewalls, load-balancers, microservices, storage, until they find out the damn thing that went wrong and manage to get out his dark operation nightmare.

But let's rewind a little bit. Usually, the DevOps stories start as a quest for speed. Pressure is strong to stay ahead of competition, to deliver features, and to do it fast. Thanks to today's technology, incredible ways to reach such speed exist: microservices, virtualization, containers, clouds, configuration management, software-defined infrastructure... To close the bridge, tools to build a continuous deployment pipeline from a development laptop to production are not a dream anymore. So you put the pieces together and...

...You are now fully DevOps!

Or are you ? Here comes the Ops part. Because if one side of ops is to fight fires, the other one is to prevent them. Unfortunately this is also dark-toned. Because it means that all components of the final service, obviously including software, must be tested and benched, must provide run-time indicators of usage and performance, send carefully crafted logs to the proper location, fail properly, etc. Implementing this in practice take time and effort, without visible immediate benefit, contrary to that new feature that we are waiting for weeks.

By the end, Ops' contribution will avoid a DevOps initiative to turn into an unstable monster. Ops' proposals and requests will have a cost, may be misperceived, and their benefits can be hard to understand at first. But a proper DevOps model should ensure that these requests have their place in the backlog.

Even then, it will not make all incidents vanish. But next time your direct-routing load-balancer VM automatically moves on the same hypervisor your webserver VM runs on, and the segmentation offload option triggers a nasty bug that make network traffic sluggish, at least you'll find out what is going on in minutes, not hours. 


That's why I'm so proud of being a "runner", when it is so difficult to motivate people for that so difficult, but so captivating job.

Like
Reply

To view or add a comment, sign in

Others also viewed

Explore content categories