Process First, Then Data
Have you ever read something about your industry written by an expert that you just completely disagreed with?
I like to keep up with other BPM software and services vendor blogs and recently ran across a post that advocated starting with data first, then building process models around it. A post by a major BPM vendor, that should know better. But then, they were pushing a new product feature around defining data.
This is just wrong.
It is rare that I see something published in BPM circles that I am in complete disagreement with - rather than having nit-pick differences with - but this was one of those times. The data then process mindset is an old-school waterfall way of thinking. I’ve seen so many BPM projects either fail or get bogged down because the BPM practitioners thought they needed to define all the data before they could build any processes.
Here is What Happens When You Start with Data-First:
- You end up with overproduction by defining data that you don't need because you're still guessing. When you start with all of your data before you define your processes, you end up with lots of data that isn’t needed.
- You miss data elements that you do need. Because you aren’t starting from the requirements of the process, when you do start defining the process, you’ll find that it requires data that isn’t in your definition and schema (or the supporting access services you build around the data).
- Your schema becomes impossible to change. You build schema, then services against that schema, and then queries and pretty soon, you can’t change the schema at all without breaking everything that depends on it. This isn’t about how hard it is to change a database definition; it is about the effects on everything else built on top.
- A data-first approach completely ignores process and how people interact with process. If you’re building a process, this is a problem. Because the process and how people interact with it shapes the way you store the data that supports the process.
So don’t start with data. Start with people and process and then use the business and technical requirements you discover along the way to define the data needed to support the process. If the data isn’t ready at hand, find out where to enter it and where to retrieve it, and then tackle it accordingly. But don’t let data define the limitations and scope of your process.
Caveats Apply
Some will misinterpret this post, so let me offer a few caveats to clarify:
- There are plenty of cases where a data store is previously defined, and forms an integration point for the process to leverage. Understanding that existing schema and definition will be important for implementation level details of your process design
- Understanding the entity life cycle of business objects is incredibly important to good process design. However, your job with process design is not to define those objects for the business, but to define what you need from them for process, and what transitions or state changes to those entities might result from process side effects or primary outcomes. What is an entity life cycle? think of any business object that reasonably exists before and/or after your process instance execution, or may be leveraged across any number of instances over time. That's an entity that has it's own lifecycle. Objects scoped only for the lifetime of the process instance are, reasonably, just process variables or process data.
- When you start process-first, you'll iterate on data design without encoding it and surrounding it with poured concrete of software implementation. Only when you're relatively set on the process needs will you start to commit the data design to a data store, integration methods, etc.
- Good BPMS tooling will give you quick-and-dirty data design tools that don't require nor assume that those tools are used before process design commences (nor, that process design is complete).
Common Objections to Process First
Don't you start process design by defining its outputs, which are just data, by the way? Of course, first principles in process design is to define the outputs or more generally, "outcomes". I equate discovering the process with discovering the requirements of the process (and yes, that starts with understanding the destination). Destination is part of process design (so is origination!) But it is a mistake to think that the destination is only data.
Destination would include all kinds of outcomes - customer success, revenue and margin impact, meeting previously agreed-to SLAs, etc. Data is just one of many things that feeds into good process. If you start with data, you're going to miss the perspective of all the rest...
Shouldn't we design comprehensive data up front? And collect "all the data" for data mining? When you do data first, you think "oh, they might collect or need this, we should design the ultimate data repository for all this stuff" - and you end up with much more complicated data schemes.
It seems like such a small distinction, but if you start with process first, it will make a huge distinction in your business process outcomes.
Scott, Nice!
Scott, it is interesting!
Process = efficiency, which leads to quality data. Very astute observation Scott. See you in a few days!