Data Analytics and Microservices
In today’s Enterprise Software Development world, DevOps and Microservice are the new ways of extant. As crazy as innovation is, if you get comfortable with your "tested and trusted" old way of doing things, you will soon be forgotten.
As a Data Analytics enthusiast, I was curious if any of these methodologies takes care of Data. How can we manage and analyze data following these new trends? What will happen to the predictive analysis, data integrity and data consistency?
Below are some of the few takeaways from my research:
In the past, we built an application connected to one database where normalized data was queried using “joins”. Then came: big data, big traffic and with that: big latency. We needed to solve query latency where no cache would help us, the data was too big.
A basic principle of microservices is that each service manages its own data. Two services should not share a data store. Instead, each service is responsible for its own private data store, which other services cannot access directly.
All major companies that are now using microservices, including eBay, Twitter, and Amazon.com, have gone through a database migration that started with a monolithic system. A true microservices platform requires each microservice to be responsible for its own data. The process for separating out a monolithic database involves a repeatable process of isolating each service's data and preventing direct data access from other services.
We were told a monolith is evil and microservices are the answer. What nobody told us is that microservices come with many pain points deriving from its distributed nature. The reason for this “One Service to One Database” rule is to avoid unintentional coupling between services, which can result if services share the same underlying data schemas. If there is a change to the data schema, the change must be coordinated across every service that relies on that database. By isolating each service's data store, we can limit the scope of change, and preserve the agility of truly independent deployments.
Another reason is that each microservice may have its own data models, queries, or read/write patterns. Using a shared data store limits each team's ability to optimize data storage for their particular service.
Since data is no longer coming from one source, but from many sources and since data is no longer of a uniform shape you need a solution that is up to the challenge. Processing data in a microservice world requires a stack that can process streams, unstructured data and structured data. And it should do it fast. When you run microservices, spend some time thinking about analytics too and look for tools that can handle log streams, business events and data lakes.
Learn more about using tools like Apache Spark and HDInsight
Apache Spark - http://spark.apache.org/
HDInsight - https://azure.microsoft.com/en-us/services/hdinsight/
References:
A modern stack for data analysis in a microservice world -
http://fizzylogic.nl/2017/02/10/a-modern-stack-for-data-analysis-in-a-microservice-world/
Data consistency across microservices
https://medium.com/@deniseschlesinger/data-consistency-across-microservices-4f768b253816
Data Integration Design Patterns with Microservices - https://blogs.technet.microsoft.com/cansql/2016/12/05/data-integration-design-patterns-with-microservices/