Solving Complex Challenges
Without complex challenges, organisations would be the same, there would be little competition or growth. But those who identify and tackle them will succeed above the rest.
Complex challenges are those which don’t yet have an obviously or easily achievable solution, they are difficult but fun to solve. And by the way you want complex challenges - if you are only addressing the easy stuff you aren’t going fast enough, that’s re-hash a of a popular Mario_Andretti quote:
“If everything seems under control, you’re not going fast enough”.
You’ve got to push the boundary of what is possible and be prepared to fail, if you are not failing, push harder, you haven’t found your limit yet. If you don’t others will and you’ll be left behind. Don't sit there polishing an old sub-standard solution, be brave, take risks, follow a path of revolution.
Describing Challenges
I tend to start by framing them within a context of:
- Business Challenges - What capabilities are we talking about. What positive impact (features, improvement, etc) are we trying to achieve;
- Technology Challenges - Quite often you’ll be considering an existing capability, so it’s a good idea to point out existing technology constraints;
- Organisational Challenges - I find this is always a major area, for example, have you got a strategy in this problem space, are there teams who feel sidelined, what about regional vs global tensions, skill/experience of the team to tackle the challenge, etc.
Often solving the technology challenges is the easy part. Making sure the business challenge is really going to have a great impact is not easy - always a risk. But by far the most complex issues are those organisational challenges.
Next, I like to describe the modern technology abstract patterns (along with a couple of concrete examples) which can help. I often avoid any vendor specific patterns to prevent an emotive debate on implementation choice. Hopefully, people agree those abstract patterns make sense, if not work on that. These patterns should describe an ideal design for the future.
Finally, outline an approach. If the patterns describe a “compass direction” for design, then the approach describes how do we quickly get there in an iterative fashion whilst also incrementally satisfying the business challenges along the way. Doing PoCs, tactical workarounds, etc. We all know big bang changes don’t often work.
A word of warning. I’m particular prone to doing this myself. Avoid a big reveal of a grand plan, instead host a discussion in which you whiteboard each of those three areas above. By all means come with some pre-prepared thinking, but discuss, don’t present. I need to take my own advice here more often, I often just get too excited when sharing. People will be far more engaged and willing if you take them along the journey.
Classifying Challenges
I like to generalise the challenge space as follows (unknowingly credit goes to Donald Rumsfeld):
- Known-Knowns: Challenges can be succinctly described and there are known solutions;
- Known-Unknowns: Challenges can be succinctly described but the solutions aren’t known or obvious (at least to the people looking at it);
- Unknown-Unknowns: There are challenges, who knows what they are let alone what the solutions to them should be.
Consider for example website security challenges. Cross-site scripting (XSS) or SQL injection attacks are Known-Knowns. Those challenges have been around for years, you don’t need to build complicated solutions to address those challenges since the techniques and protection against them are well understood - a properly configured web application firewall would do the job. However, there are different solution implementations and the ease by which they work varies. The individual components might be complicated, but if you are using them as a cloud PaaS/SaaS for example then the solution is greatly simplified.
Another well publicised web app exploitation attack is DDoS (Distributed Denial of Service), so it’s a known problem. The solution to it though really depends on a good architecture and infrastructure strategy, so it might not be clear to some but even if it is, some architectures might make good protection difficult to achieve. We potentially have a Known-Unknown here (depending on circumstances).
As for Unknown-Unknowns this can be anything. A bit like the previous example though, the threat might be known to some people but not to others. So a clued-up team might understand that bots can be scripted to emulate human behaviour with stolen identities but may or may not know how to defend against them.
By default all you have is Unknown-Unknowns. The challenge you should set yourself is moving as much as possible into the Known-Known space - so it can be solved. The latter two types are the science of research and can’t be effectively solved by grunting away. I would also say that a team at the top of their game will much more likely to be able to quickly move challenges into the Known-Known space. In contrast, an inexperienced or unmotivated team will likely linger in the Unknown-Unknown space. Organisational culture, delivery approaches, etc will all affect the efficiency at which complex challenges can be addressed. Ownership, motivation, skill and experience are critical to success.
Before we move on let’s cover the three challenge categories in a bit more detail.
Known-Knowns
Challenges can be succinctly described and there is are known solutions to them. These require little further effort. Research isn’t required. Just prioritise it on the backlog and get it done. It might be an embarrassment factor - you might think you should already be doing these things.
These are the lowest risk and cheapest challenges to address. However, if all your challenges start in the space, you aren’t pushing hard enough, but they are early quick wins.
Known-Unknowns
Challenges can be succinctly described but the solution isn’t known or obvious (at least to the group looking at it). So these might be well understood problems which have well understood solutions, but well, you don’t know what you don’t know. These are good research candidates. It’s also a pretty good idea to kick the idea around your colleagues if you haven’t done so already. These are things you might think your competitors are focusing on or perhaps your CEO stretches the truth about the organisation being able to currently to do (I’ve been on the end of that before…).
The approach is to research solutions (through PoCs, etc) in order to articulate what the solution could be (you might come back with options), at which point the problem simply becomes a Known-Known.
Addressing these challenges is high-risk and poses potentially high costs. Following research you might change the definition of the challenge you are trying to solve or abandon it for a while. Spending time/money here is about understanding and de-risking not delivery. You need to be content that the failure rate of research here could be as high as 50%, increasing success rates depends on the quality of your team in selecting the right challenges along with their ability to see them through. It doesn’t just have to be your own team though, these challenges are also great time and materials work packages to engage consultancies - this can be a really good way of bringing in new ideas.
Unknown-Unknowns
There are problems, who knows what they are let alone what the solution to them should be. This is the super hard stuff. It’s open research into a broad vision, it’s usually low yield, high cost but if you can get something out it can be a game changer for the industry let alone your organisation.
Similar to before, you’ll want to want to move challenges out of this space and into the known-unknowns. Because this sort of open-ended research can get expensive very quickly, it’s better to bound effort by time/money and then review the results before investing further. Again, you’ll never deliver capabilities directly out of this space, but you might be hacking PoC’s together. Just don’t be results driven, focus on getting a better understanding on the art-of-the-possible.
A lot of products we take for granted came out of people working out of this domain - smartphones, tablets, etc. I remember industry analysts slating the launch of the Apple iPad saying nobody would buy it, soon after, it was out-selling laptops, humble pie for the naysayers. But those products took years to get from an idea into the hands of customers. Perseverance and patience with a relentless personal obsession to doing amazing things and not settling for mediocrity is required.
Effect on engineering approach
Research is very different to delivery. I would argue you can’t efficiently solve complex challenges following a traditional iterative delivery approach like RUP/UP - and you certainly don’t want a waterfall approach. But that doesn’t mean no discipline - in fact research probably requires better discipline than normal engineering. Also, you might want to consider encouraging better ownership which should help motivate people to tackle each of those three challenge domains.
In contrast to traditional approaches, consider one which allows teams to come up with ideas, research them, deliver early solutions into production and have people use them. All the while creating an environment for which these new ideas can be delivered quickly and fail quickly but safely. The principle is that a failure of one thing must not affect another. It’s good to fail, we learn more from our failures than our successes. Effective leadership in this area requires continual improvement - rehearsal, practice and repetition. It’s about communication and expectation management. Setting up accountabilities and holding people to them. Failure to deal with change, resistance, new ideas and opportunities maybe terminal. No command and control. Instead praise, engage, motivate, pay well, and give opportunities for promotion and development. All this is essential to a ‘feel good’ environment in which ideas can flourish and encourage people to loose those imaginary boundaries of what is actually possible vs common perception.
Obviously you can’t make this change overnight, but hopefully, you’ll see this as a good direction to head towards. Even if you start by running mini-RUP/UP on a per capability area you’ve made a step in the right direction - just keep improving your approach though and really work on cracking the ownership challenge. And also focus on moving people to think about long-term continual pro-active capability improvement rather than a software delivery factory, where you simply “transition to BAU operations” at the end of every phase - that’s a recipe for instance legacy.
People solve challenges, not technology
OK, so practically how would this work? Does this mean teams get to run around crazy and release shoddy code? Obviously not.
I recommend starting by first defining the capabilities that make up how your organisation makes money as a business right now and how you want it to make money in the future. Primarily group teams around those capabilities rather than horizontal services, a team should contain everybody needed to design, build, operate and improve that capability with little or no external dependencies - engineers, operations, management, etc. These groups of people have the responsibility to ensure a success is made, to avoid any finger pointing or “we couldn’t do it because we were waiting on X” arguments, teams need to have the freedom to work as they feel is best for them without depending on others too much outside their team.
In doing this, the organisational challenge is to focus on strong leadership (rather than traditional line management) with a light-touch to ensure capabilities are optimally aligned to each other and that all blockers are removed. Within this sort of open and free structure the only limit of innovation and performance are the people in the teams and budget. Many of the top technology organisations Google, Amazon, Netflix, Spotify, etc follow this general approach.
Solving complex challenges in this sort of culture should be much easier - somebody has a good idea, prioritise it tackling it based on it’s impact, then get to work on the fun task of tackling it.
But for many there is a change challenge here. How do you encourage people in a traditionally structured company where IT has taken a backseat and who have been in the job a long time - sometimes decades, to want to make such a transition? They have to decide they want it, forcing them will result in failure. In my experience, most engineers want to head in this direction, the challenge many of them have is how to execute an incremental change to get there - certainly a big bang change isn’t going to work. Their main concerns typically surround the fact that their current commitments already take up all their time. Age, gender, etc are not barriers, only people’s attitude and ability self-improve. Hard skills like learning a new programming language can easily be taught in weeks. There is no quick route to experience though…
Also, many engineers who are used to traditional ways of working may not have kept up-to-date with modern technology and practices, so make sure to supplement teams (expert contractor resources work well) with expertise to help them be successful. Creating a great culture without having skilled engineers won’t work.
Some recommendations on how you might go about helping engineering teams with making that change:
- Backfill people with temporary staff to allow them to focus on achieving the change;
- Encourage them to delineate more responsibility to others;
- Use expert engineers to help teams be successful, bring in new people - employees, contractors, consultancies;
- Finally, if some people don’t want to be part of the team, then support them in either finding another role - inside or outside the organisation. Having people sitting outside the team will alienate them into throwing rocks, this will create a toxic environment.
Once the change is working well, their original responsibilities will likely find homes with others as team responsibilities align to capabilities. And let’s be clear, people solve challenges not technology (which are just components of a solution), so creating the right environment for those people to thrive is critical to success.
Choreography not orchestration
Continually viewing the way your organisation works as an ‘end-to-end’ process will likely result in rigid, inflexible teams that become highly dependent on each other. Making it hard to solve complex challenges. You’ll have a very brittle organisation that struggles to respond to change because of the ripple effects they have on everybody. I describe this approach with a term from microservice architecture antipatterns - orchestration where there is some ‘over-arching’ process controlling everything.
The alternative pattern is choreography, where there is no centralised control. You describe what it means to be a ‘good citizen’ in the organisation’s ecosystem, a key principle is that when something significant happens you broadcast an event to tell everybody. Other citizens can then subscribe to those events and carry out whatever action they feel is necessary. Importantly, the citizen producing the event isn’t telling consumers what to do next. Effectively, we inform each part of the system what their job is and allow them to work out how to achieve it.
This is a transformational change.
Taking an example from the popular book Building Microservices by Sam Newman, consider a customer loyalty capability, which will create a new loyalty account whenever a new customer account is created with the following high-level process:
- A new record is created in the Loyalty Points Bank for the Customer;
- The Postal system sends out a Welcome pack;
- The Email system sends out a Welcome email;
This can be modelled as follows:
Imagine the Customer service acts as the central controller, orchestrating the process flow. So on creation of a new record it talks to the Loyalty Points Bank service, Postal service and the Email service. The Customer service tracks where the customer is in this process, dealing with exceptions as they occur. The downside of this approach is that the Customer service becomes a central orchestration service - a hub in the middle of a web, a central point for logic with dumb services. As ideas and requirements grow, that Customer service becomes an ever increasing sink for business logic, it will be hard to introduce change as we’ll always need to test that end-to-end process and it’s unlikely we can re-use that logic elsewhere, also the underlying services become so dumb they are basically baby-sitting databases. Value becomes locked inside specific processes.
With a choreographed approach, we could instead have the Customer service simply emit a message saying Customer created. The other components can then just subscribe to these events and react according to the job they need to do.
The downside is that the high-level process is only intrinsically modelled, which means additional work is required to track progress. But the big upside is that because we are not explicitly modelling processes, we are setup to be very flexible to change. For example, what if we’d like information on new Customers to be feed into an analytics system, well that system just needs to subscribe to those events - the Customer service has no knowledge of what other services are consuming the events it produces, so we don’t need to test the impact of introducing new components. And because we are building logic into services (instead of orchestration components) the value we create becomes easily reusable.
We end up with a message/event driven architecture, which gives us a big safety net to introduce new components quickly without having any effect on existing components.
The con is you will likely need a careful look at how you express and model business requirements. If you try to use this architectural technique when your business requirements are stuck in end-to-end process world, then you’ll hit Conway’s Law as your requirements and designs grind against each other. You need to cut both your architecture and how you describe the business at the same grain.
I also recommend applying this logic to your organisational structure. Give teams clear responsibility and focus on a single area without dependencies or impacting others, it’s a leadership responsibility to provide that choreography between teams to ensure overall success.
Think about it another way, if you want to get promoted at work, you could either a) focus on achieving promotion, continually justifying your position and managing it as if it were a process; or b) focus on each area of competency individually, try to do well in those areas, push your boundaries into new areas, behave autonomously working well with others, then promotion will happen naturally. Which approach do think is more likely to succeed? OK, that analogy is a bit of a stretch, but hopefully you get the point.
Considering all this should help provide an environment which can promote innovation and growth, in which complex challenges can be addressed quickly and safely.
Measuring impact
Instead of measuring progress by working software (one of the Agile principles), start measuring by impact - did that new idea have the positive impact you intended? If not fail fast, fix it or remove it. Delivering working software isn’t enough, relentlessly seek and quickly implement ideas which you think will have maximum impact, don’t pat yourself on the back until you have measured that impact being achieved.
This idea of taking risks, failing fast and measuring impact not working software are well accepted methods for some of the most forwarding thinking and popular tech companies.
Impact can come in various forms, for example:
- Customers can now place orders via our website, since go-live we have had an increase of 20% in new orders;
- Operations teams can now manage media via the web portal vs their legacy desktop tools. It has reduced media processing times down from 3 days to just 1 hour;
- Engineers have automated the provisioning of all environments, allowing us to create/destroy environments on demand within minutes, by doing this last month we cut hosting costs by 20% and reduced bugs related to environment differences by 100%, we can now perform releases 4hrs quicker than before.
I feel this concept of ‘impact’ has much more meaning than ‘value’, for these reasons:
- Impact is usually easier to measure, value is sometimes a bit subjective;
- Value is often tied to ‘customer value’, as such, for tasks like ‘automate environment provisioning’ it’s difficult to express customer value - but you should easily be able to express the impact.
Its important all things have impact - otherwise why are you wasting time doing it - when you could be working on things have impact. Measuring impact is a fair way to consider all work that needs doing, not just customer facing features.
It’s relevant when you are looking at solving complex challenges, simply because, by their nature they will take time, so picking the right ones to solve is a critical skill.
Hopefully this article has been informative and gives you some different approaches to consider when solving your own complex challenges.