Risk Explained

Thomas Fejfer

Published May 6, 2025

for people who build and run things

If you operate or manage technical systems, then you already deal with changes. To keep systems running or improve performance, you carry out different kinds of tasks. Some are routine maintenance. Others involve changes like upgrades, modifications, repairs, or replacements.

Every time you make a change, there’s a chance that something won’t go as planned — a setting might be missed, a part might not fit, or an unexpected interaction might cause trouble.

This article is for people working in these environments — engineers, technicians, or managers — who are responsible for or directly involved in that kind of technical work.

1. What is Risk?

Before we can handle risks effectively, we need a clear understanding of what risk is. The international standard ISO 31000 defines risk as “the effect of uncertainty on objectives.” It’s a concise definition — but still abstract in practice. A more hands-on version comes from Dr. David Hillson:

uncertainty that matters

In other words, for something to be a risk, it must be uncertain — and it must matter to us. That’s why all risks involve uncertainty, but not all uncertainties are risks. The distinction is important because it helps us separate risks from their likely causes.

Example: We're planning to apply the latest Security Patch to all Windows Server 2025. Some servers have pending disk errors. These disk errors are not risks — they are known issues, or certainties. But they can be likely causes of a real risk that the servers may fail to restart cleanly after patching. That’s the risk — an uncertainty that matters, because it could affect uptime or performance. And there could be other likely causes for that.

So, is it important to know the difference between a risk and a likely cause? Absolutely — if you want to work systematically with risk and avoid serious trouble. Confusing the two can lead you to miss real uncertainties that can hurt you, or waste time fixing things that aren’t actually risks at all.

2. Why is risk important?

Changes to technical systems don’t always go as planned — and when they don’t, the impact can be serious. Client data shows that around 70% of all incidents are caused by our own changes. According to Gartner, that number rises to 85% for performance-related incidents. This means that identifying and reducing risks before making a change is one of the most effective ways to avoid disruption.

If outcomes like system performance, availability, security, or functionality are important, we must actively think ahead about what might go wrong.

Because risks haven’t happened yet, we have the chance to act before small uncertainties turn into big problems. By identifying and dealing with uncertainties early, we give ourselves the best possible chance to maintain performance, avoid disruptions, and achieve our goals.

3. How do we manage risks?

Managing risk doesn’t need to be complicated. At its core, it’s a simple and logical way of thinking. It builds on a few powerful questions that help us stay in control and avoid surprises.

We begin by asking: What are we trying to achieve? This defines the task or change we are going to make, for example, opening a firewall port while preserving performance and security.

Then we ask: Does this task need a closer look? Not everything does. If it’s routine, well understood, and low impact, we can often just go ahead and execute. But if there’s any real potential for uncertainty or consequences, we ask the next questions.

That brings us to: What might affect us? This helps us identify the uncertainties — the things that could influence how the task plays out.

Next, we ask: Which are the big ones? That’s how we identify the risks — the uncertainties that could affect something important.

Finally, we move to action: What can we do about it? and What do we do about it? These last questions help us reduce uncertainty, avoid trouble, and give the task the best possible chance of success.

By asking and answering these questions systematically, we stay proactive and protect the availability, performance, and reliability of our systems and services.

This is also where approaches start to differ depending on your domain. Please refer to Operational Risk Handling - good practice for avoiding technical problems, a method that is aimed at managers, engineers, and technicians working in technical operations, where decisions are made in real-time and under real constraints. It builds on practical experience as well as the risk theory introduced earlier — and brings it together in a structured, field-ready method.

Acknowledgement

This document draws on insights and best practices from several respected sources, including:

ISO 31000 Risk Management Guidelines
Dr. David Hillson’s answers in the “100 Risk Questions” video series by Dr. Salim Al-Harthi
and other practical frameworks commonly applied in technical and operational environments.

To view or add a comment, sign in

Risk Explained

Thomas Fejfer

for people who build and run things

1. What is Risk?

2. Why is risk important?

Recommended by LinkedIn

3. How do we manage risks?

Acknowledgement

More articles by Thomas Fejfer

Others also viewed

So What's Next? --How CEOs, CIOs, and CISOs Should Respond to Situations the CrowdStrike-Microsoft Incident

Shrinking Certificate Lifecycles: Security, Reliability, and Business Impact

Don't Panic! Preparing for critical situations like a pro!

Preventing IT Disasters: Lessons from CrowdStrike's 2024 Outage and the Vital Role of Test Environments

Building Better: A Modular Approach To Security, Compliance & Resilience

The 47-Day Certificate Validity Debate: A Necessary Security Evolution or an Operational Nightmare?

The DDoS Governance Gap Boards Keep Missing

Ensuring the Reliability and Security of Third-Party Solutions, lessons learned from CrowdStrike’s failure regarding your other providers.

Cyberattacks Don’t Just Cost Money—They Cost Time. Here’s How to Reduce Downtime.

A Simple Effective Patch Process

Explore content categories

for people who build and run things

1. What is Risk?

2. Why is risk important?

Recommended by LinkedIn

3. How do we manage risks?

Acknowledgement

More articles by Thomas Fejfer

Når krisen rammer, vinder dem med overblik

Systematic Troubleshooting - structured problem-solving in technical environments

Operational Risk Handling - good practice for avoiding technical problems

Hvorfor den reaktive del af Change Management er vigtig

Rapid Cause Identification (RCI) – Turning Confusion Into Ownership

Problemer opstår ikke ud af det blå – de opstår, når vi ignorerer risici.

Hvorfor ledere ikke skal stille spørgsmålet ‘Hvad er risikoen?’

Når cyberangrebet rammer: Det første skridt er ikke teknik – det er overblik

Når teknologi fejler: Hvordan ledelse og systematik sikrer fremdrift

Boost your organization's problem-solving capability

Others also viewed

So What's Next? --How CEOs, CIOs, and CISOs Should Respond to Situations the CrowdStrike-Microsoft Incident

Shrinking Certificate Lifecycles: Security, Reliability, and Business Impact

Don't Panic! Preparing for critical situations like a pro!

Preventing IT Disasters: Lessons from CrowdStrike's 2024 Outage and the Vital Role of Test Environments

Building Better: A Modular Approach To Security, Compliance & Resilience

The 47-Day Certificate Validity Debate: A Necessary Security Evolution or an Operational Nightmare?

The DDoS Governance Gap Boards Keep Missing

Ensuring the Reliability and Security of Third-Party Solutions, lessons learned from CrowdStrike’s failure regarding your other providers.

Cyberattacks Don’t Just Cost Money—They Cost Time. Here’s How to Reduce Downtime.

A Simple Effective Patch Process

Explore content categories