Microservices - Performance Evaluation, Testing and Deployment
Performance Evaluation Of Microservices
The overall health of a microservice affects its performance. To be able to list the metrics we can use to measure microservice performance, we need to first define what to measure, and at what levels and areas to make the measurements in. The general health of a system/service constitutes of several things and one or more of these can be measured using one or more of the following metrics/techniques.
Factors that constitute the health of a Microservice? (What to measure?)
- Reliability
- Throughput (Number of requests or transactions per second)
- Service response time (The time between client sending a request and receiving a response)
- Availability
- Fault Tolerance
- Saturation (Or the amount of load on the service)
- Scalability
- Latency (Duration for which a request is waiting to be handled)
- Resiliency/Recoverability (example - Chaos Monkey by Netflix)
- Security (OWASP microservice security threats [5], Authentication, Authorization, Data encryption at REST and in-transit, network security, ACLs, Resistance to Denial of service attacks, etc.)
- Stateful-ness v/s Statelessness (i.e. is the Microservice able to honor the stateful or stateless nature of the transactions?)
- SLA/SLI/SLO contracts
- Number and severity of Errors/Failures
- System up-time v/s downtime
- Number of Error Budget breaches [6]
- Hardware/node resources health
- Staleness of data returned by the service or application lag which lies beyond the acceptable standards for the service (where applicable)
o Example – data staleness introduced by the lag introduced at the database layer due to replication across read/write replicas, network delays, etc.
Two levels of Microservice health evaluation:
MONITORING
- End user experience monitoring
- Service interaction monitoring
- End-to-end performance monitoring
- Service health monitoring
OBSERVABILITY
Both Monitoring and Observability can be implemented at various levels of the ‘Observability Maturity Model’ [1] using different tools and techniques.
We can either directly add instrumentation to the microservice code or use third party tools to set them up for monitoring and observability (e.g. APM libraries with Elasticsearch/New Relic, Logstash, custom or third party instrumentation APIs, etc.).
Performance metrics to look for to measure the health of a Microservice:
Service Response Time
Throughput
Constitutes the number of requests per second.
Scalability
Number of transactions per second, the number of requests per second, and the latency of transaction.
Saturation (Or the amount of load on the service)
Metrics provided by the RED method, Golden signals, APM and Observability.
Latency
Metrics provided by utilizing Instrumentation techniques, RED method, Golden signals, APM and Observability.
Number and severity of Errors/Failures
Both at the service mesh level and individual service level.
Service downtime v/s up-time ratio
As the name suggests, impacts availability.
SLIs, SLOs and Error Budgets (to measure service reliability, availability, etc.) [6]
o SLI
o SLO
o Error Budgets
APDEX score [3]
Platform metrics
Monitoring platform metrics is critical to keeping microservices infrastructure running smoothly. The following metrics are included-
Resource metrics
Golden signals/metrics
Metrics related to the RED method
Security related metrics
OWASP microservice security threats [5], Authentication, Authorization, Data encryption at REST and in-transit, network security, ACLs, Denial of service attacks, etc.)
Distributed tracing in case of Service Mesh [9]
Health of a service
Bad v/s good performance?
Recommended by LinkedIn
Other comments
- Requests per second or throughput handled by the Service Mesh (also by each microservice)
- Maximum number of concurrent requests per second
- Ratio of reads v/s writes to a database
- Service Mesh level (Overall health of the Service Mesh than individual services)
- Microservice level
- Application level (relevant low level metrics within Microservices)
- Application Performance Monitoring (APM) and Dashboards – APDEX Score, Custom acceptable thresholds, etc.
- Example third party APM libraries are provided by Elasticsearch, New Relic etc., as an example.
Testing Of Microservices
Types of Microservice testing
Base testing
Load testing
- 80% of the effects drive from 20% of the causes
Resiliency testing
Testing strategies for Microservices [16][17][18]
Unit testing
o Solitary unit tests
o Sociable unit tests
Contract testing
Integration testing
Component testing
End-to-end testing
Feature flags
Traffic shadowing
A/B Testing
Deployment Of Microservices
The steps to deploy a microservice depend on the deployment strategy being used.
Types of Microservice deployment strategies [19][16]
Blue-Green deployment
- Blue – Has new code but inactive
- Green – Has old code and is active
Canary deployment
Dark launching
Staged release
The staged release deployment strategy for microservices involves gradually releasing microservices to one environment at a time. For example, the
development team first releases the microservices to the testing environment and later to production.
Rolling deployment
Services that don’t have to be constantly available can swap in the new code for the old one and then be restarted.
A/B testing (to determine which features to deploy)
Explained above in detail, A/B testing helps make a determination of which features to deploy.
References
[1] Cloud Native Observability with AWS -https://www.youtube.com/watch?v=UW7aT25Mbng
[2] What is a Service Mesh - https://www.nginx.com/blog/what-is-a-service-mesh/
[3] APDEX score for measuring Service Mesh health - https://tetrate.io/blog/the-apdex-score-for-measuring-service-mesh-health
[4] Designing Data Intensive Applications - https://dataintensive.net/
[5] OWASP Microservice Security Threats - https://lalverma.medium.com/microservices-owasp-security-threats-eabcd836e08b
[6] Google SRE Error Budgets - https://cloud.google.com/blog/products/management-tools/sre-error-budgets-and-maintenance-windows
[7] The Art Of Service SLOs by Google - https://sre.google/resources/practices-and-processes/art-of-slos/
[8] Observability in AWS - https://aws.amazon.com/blogs/big-data/part-1-microservice-observability-with-amazon-opensearch-service-trace-and-log-correlation/
[9] Istio Observability - https://istio.io/latest/docs/concepts/observability/
[10] Service Mesh monitoring - https://sysdig.com/blog/monitor-istio/
[11] RED method - https://www.infoworld.com/article/3638693/the-red-method-a-new-strategy-for-monitoring-microservices.html
[12] Latency analysis for microservices - https://www.garudax.id/pulse/latency-analysis-microservices-evrim-%C3%B6z%C3%A7elik/
[13] Netflix Chaos Monkey - https://netflix.github.io/chaosmonkey/
[14] When should I start load testing? - https://techbeacon.com/app-dev-testing/when-should-i-start-load-testing
[15] Canary deployments, A/B testing and microservices - https://blog.getambassador.io/canary-deployments-a-b-testing-and-microservices-with-ambassador-f104d0458736
[16] 5 testing strategies for deploying Microservices - https://devops.com/5-testing-strategies-for-deploying-microservices/
[17] Testing Microservices 12 useful techniques - https://www.infoq.com/articles/twelve-testing-techniques-microservices-intro/
[18] Testing strategies for microservices - https://semaphoreci.com/blog/test-microservices
[19] Microservice deployment pattern that improve availability - https://www.opslevel.com/blog/4-microservice-deployment-patterns-that-improve-availability