When a production issue happens, technical skill is not the first thing tested. Decision quality is. Last month, I faced a backend incident during a high-traffic period. The team had 2 options: • Ship a quick patch directly in a critical flow • Roll back, stabilize, and fix with safer validation The quick patch looked faster. But it also increased risk in a part of the system with many integrations. We chose to roll back first. What happened after that: • Incident impact was reduced quickly. • We had time to identify the real root cause. • The final fix was smaller, clearer, and safer to maintain. My main lesson: In pressure moments, good engineers don’t choose the fastest code change. They choose the option with the best risk/clarity trade-off. This is where architecture and communication work together. How do you usually decide under pressure: quick patch or rollback first? #SoftwareEngineering #BackendEngineering #SoftwareArchitecture #SystemDesign #EngineeringMindset #ScalableSystems #TechGrowth
Nice post! And answering to your question: It depends on how safe the quick fix is. If it’s low-risk and well understood, I go for it. However, I always like to have a rollback ready in case things go sideways. If there’s uncertainty, I prefer to roll back first and stabilize.
Nice post Mateus Eduardo Pereira, I’ve seen similar scenarios in microservices architectures. A quick patch in a critical flow can propagate inconsistencies across multiple services. Rollback + proper validation is often the smarter path
Great insight. Thank you for sharing
Great post!
Rolling back first is also a way of buying clarity - once the pressure of the active incident is off, the root cause tends to surface faster and the fix ends up simpler