Whether you’re purposely using micro-services, having to integrate with 3rd party systems, or just the team down the hall’s services, it’s almost inevitable that an enterprise system will have to communicate with something else. Or at the very least have a need to do some kind of background processing within the same logical system. For all those reasons, it’s not unlikely that you’ll have to pull in some kind of asynchronous messaging tooling into your system.
It’s also an imperfect world, and despite your best efforts your software systems will occasionally encounter exceptions at runtime. What you really need to do is to plan around potential failures in your application, especially around integration points. Fortunately, your asynchronous messaging toolkit should have a robust set of error handling capabilities baked in — and this is maybe the single most important reason to use asynchronous messaging toolkits like MassTransit, NServiceBus, or the Jasper project I’m involved with rather than trying to roll your own one off message handling code or depend strictly on web communication via web services.
In no particular order, I think you need to have at least these goals in mind:
- Craft your exception handling in such a way that it will very seldom require manual intervention to recover work in the system.
- Build in resiliency to transient errors like networking hiccups or database timeouts that are common when systems get overtaxed.
- Limit your temporal coupling to external systems both within your organization or from 3rd party systems. The point here is that you want your system to be at least somewhat functional even if external dependencies are unavailable. My off the cuff recommendation is to try to isolate calls to an external dependency within a small, atomic command handler so that you have a “retry loop” directly around the interaction with that dependency.
- Prevent inconsistent state within your greater enterprise systems. I think of this as the “no leaky pipe” rule where you try to avoid any messages getting lost along the way, but it also applies to ordering operations sometimes. To illustrate this farther, consider the canonical example of recording a matching debit and withdrawal transaction between two banking accounts. If you process one operation, you have to do the other as well to avoid system inconsistencies. Asynchronous messaging makes that just a teeny bit harder maybe by introducing eventual consistency into the mix rather than trying to depend on two phase commits between systems — but who’s kidding who here, we’re probably all trying to avoid 2PC transactions like the plague.