Microservices and Database Replication
In a previous post, I discussed briefly the issue of data sharing in microservices. The consensus seems to be that each service must have its own database.
This is a solution that is not very comfortable to adopt. It has been the most problematic part of microservice architectures for me to wrap my mind around. It seems also to point out the ultra specialization that our industry has. The application architecture community (OO thinking) does not solicit the opinions of data architects (data/non-OO thinking). That's probably one reason NoSQL is succeeding. NoSQL databases seem to be part of applications design, not part of enterprise database design, (more on that later).
The OO modeling and design literature, dominated now by DDD (Domain Driven Design), talks about bounded contexts as the foundation of the solution. The problem is that the basis for where to draw the boundary is not clear. Some writers advocate functional business boundaries. The basis for breaking down a domain then becomes functional partitioning which the OO community has argued against for decades.
Bounded contexts alone don't solve the problem. In the DDD community there is a lot of talk about communication between bounded contexts, "anti-corruption" layer, CQRS, and event-sourcing. For a new comer to microservices this is all very discouraging. It moves rapidly from a promise of simplicity to a reality of deep complexity.
I think all these data architecture discussions, in the context of application design, are misplaced. These are issues of distributed databases. Application design is not the place to design a data architectures for the enterprise, based on what a small set of applications need. Besides, it makes application design for a microservice much harder than it should be.
Martin Fowler suggests a useful way to partition this problem by distinguishing between an application database, and an integration database. I have not found consensus on that separation as a solution to the data problem in microservices.
In all my IT projects that involved database, and nearly all did, a database was understood by all to be an enterprise asset. Data architects and DBAs layout the data model outside the boundaries of any one application. In thirty years of IT work, I have never seen the enterprise data group allow an application team to design schemas. This occurs only in ORM books. No DBA will allow a developer to create a database schema in his IDE.
I have always found the notion that a database is simply a place for the persistence of application objects to be a serious oversimplification of the roles of data vs the role of applications. The database has a life of its own. It is not just a place where objects go to sleep. I always prefer a "data access layer" to "persistence layer". Data is out there, with, or without, this application, or created by many other applications. The application humbly accesses the data.
Bounded context, and database design for one microservice, will not be a concept embraced by corporate DBAs. Corporate DBAs are unlikely to accept that application modelers lay out a distributed database schema and implement it in application code (via events). I don't think any DBA will support the notion that an application owns the data. That goes against all the fundamentals that a DBA believes in.
Back to the issue at hand, and the new realities. Microservices are the chosen development direction for many new projects, as they solve thorny problems in development, deployment, and continuous delivery. But the fact remains, they create a problem with data sharing across bounded contexts.
This is at its heart, a data replication problem. It is not an application design problem. Data architects and the database community has solved this problem many years ago.
An enterprise could allow each service to own data in its bounded context, in its own application database, but use replication outside the context of any one application to replicate the updates to an integration database, that is available as read-only data source to all applications.
In my personal experience, I found the concept of application database (distinct from the enterprise database) particularly useful for long running transactions (hours to days). Essentially a subset of the tables were our "bounded context". Our application considered this data to be "session data". At the beginning of the long-running session, a session database instance was created by extracting data from the enterprise database. As the "case workers" worked with the session data, they updated only the session database. Only when the session was explicitly closed was the enterprise database synchronized to reflect updates from the application database. This was done with hand-crafted custom synchronizers that knew both schemas.
Other organizations I worked with, mainly on financial systems, used replication servers to synchronize application data with enterprise databases. The application itself was not concerned with how the data was replicated. It was not visible to application logic.
One organization I worked with solved the data ownership problem by having Tactical Data Stores (TDSs), one per application, and one Operational Data Store (ODS) with data shared among all applications. Data from multiple TDSs was replicated into the one ODS. The TDS/ODS division of data into private application and shared enterprise data, worked quite well for the most part. It had issues, which were known, and tolerated.
Replication can be one solution to the data ownership problem in microservices. By isolating an event-based data sharing solution into its own data infrastructure, the development and deployment of microservices can be made simpler.
nabi and thus we are thank full for great tools like pentaho's etl product kettle that enables moving that data in a well scriped and easyly built process to those integrated datastores
Nice article Nabil. I find that those who promote the strict interpretation of each microservice owning the respective schema and data - usually gloss over the CAP Theorem implications. In fact, at QCON in San Francisco - the architects from a very high profile company, who were presenting a session on microservices, said that they were still struggling with how to handle those concerns.